Platform EngineeringAutomationInfrastructure

Executable Code and Diagrams in Technical Books: A Practical Setup Guide

How to handle snippet extraction pitfalls, integrate Mermaid and Draw.io diagrams, and build a CI pipeline for technical books with infrastructure-dependent code examples.

3 April 2026 · 10 min read

This is a follow-up to Writing Technical Books in 2026: Tools, Workflows, and the Case for Executable Code, where I compared Quarto, Jupyter Book, Pandoc, and other tools for technical book authoring. That post ended with a recommendation: use Quarto with a hybrid approach — inline executable code for simple examples, snippet extraction from a tested repo for infrastructure-heavy ones.

This post gets into the practical details. How do you actually set up snippet extraction without it becoming a maintenance nightmare? How do you handle diagrams across PDF and HTML output? And what does the CI pipeline look like for a book that needs Docker infrastructure to validate its code examples?

I’ve published a working template repository that implements everything discussed here — a Quarto book with Postgres via Docker Compose, executable code, snippet extraction, pre-commit hooks, supply chain protections, and a full CI pipeline. Fork it and start writing.

The Hybrid Architecture

Most technical books fall into a pattern: early chapters set up infrastructure (databases, message queues, cloud services), and later chapters build application logic on top. The infrastructure chapters need real, tested code but can’t execute during a book render. The application chapters can often run inline.

flowchart TB
    subgraph "Chapters 1-2: Infrastructure"
        A[Tested repo] -->|snippet extraction| B[eval: false<br/>code blocks]
    end
    subgraph "Chapters 3+: Application"
        C[Inline code in .qmd] -->|Quarto executes| D[Output captured<br/>in document]
    end
    E[Docker Compose] -->|pre-render script| C
    A -->|CI tests| F[Validated independently]

The project structure:

book/
  _quarto.yml                        # Book config with pre/post render hooks
  chapters/
    01-infrastructure-setup.qmd      # Snippets from tested repo
    02-platform-basics.qmd           # Snippets + some inline
    03-data-pipeline.qmd             # Inline executable code
    04-monitoring.qmd                # Inline + diagrams
  code/                              # Tested source code repo
    ch01/
    ch02/
    tests/
  diagrams/                          # Draw.io/Excalidraw sources
  scripts/
    start-infra.sh                   # docker compose up -d
    stop-infra.sh                    # docker compose down
    lint-snippets.py                 # Validate snippet references
    export-diagrams.sh               # Draw.io → SVG
  .github/workflows/
    book.yml                         # Full CI pipeline

Snippet Extraction Done Right

The basic idea is simple: mark regions in your source code with named tags, reference them from the manuscript, and a build script resolves the includes. The pitfalls are also well-known — chapter reordering breaks narrative context, refactoring cascades into the manuscript, dead snippets accumulate. Here’s how to handle each one.

Snippet markers

Use a consistent format that works as valid comments in any language:

# code/ch01/kafka_setup.py

def create_producer():
    # <<< snippet: kafka-producer-setup >>>
    producer = KafkaProducer(
        bootstrap_servers=['localhost:9092'],
        value_serializer=lambda v: json.dumps(v).encode('utf-8')
    )
    return producer
    # <<< /snippet: kafka-producer-setup >>>


def send_event(producer, topic, event):
    # <<< snippet: kafka-send-message >>>
    producer.send(topic, event)
    producer.flush()
    # <<< /snippet: kafka-send-message >>>

Reference them in the manuscript with eval: false so Quarto renders the code but doesn’t try to execute it:

## Setting Up the Producer

```{python}
#| eval: false
{{< include ../code/ch01/kafka_setup.py#kafka-producer-setup >}}
```

The producer serializes each event as JSON before sending.

Quarto’s include shortcode handles the extraction natively — no custom build script needed for basic cases.

Pitfall 1: Chapter reordering breaks narrative references

The rule: never reference chapters by number in prose. Use Quarto cross-references instead:

<!-- BAD — breaks when you reorder -->
As we saw in Chapter 3, the producer connects to the broker.

<!-- GOOD — Quarto resolves automatically -->
As we saw in @sec-kafka-setup, the producer connects to the broker.

Every section that might be referenced gets an explicit label:

## Setting Up Kafka {#sec-kafka-setup}

Quarto resolves @sec-kafka-setup to the correct chapter and section number regardless of ordering. If you move the section, every reference updates automatically. This applies to figures (@fig-architecture), tables (@tbl-metrics), and code listings (@lst-producer) too.

Pitfall 2: Refactoring code cascades into the manuscript

When you rename a function or restructure a module, three things can break: snippet markers, include references, and prose descriptions.

Snippet markers and includes — a CI linter catches these:

#!/usr/bin/env python3
"""Validate that all snippet references in chapters resolve to defined snippets."""

import re
import sys
from pathlib import Path

SNIPPET_DEF = re.compile(r"<<<\s*snippet:\s*([\w-]+)")
SNIPPET_REF = re.compile(r"include\s+\S+#([\w-]+)")

def main():
    # Scan code/ for defined snippets
    defined = set()
    for path in Path("code").rglob("*"):
        if path.is_file():
            for match in SNIPPET_DEF.finditer(path.read_text()):
                defined.add(match.group(1))

    # Scan chapters/ for referenced snippets
    used = set()
    for path in Path("chapters").rglob("*.qmd"):
        for match in SNIPPET_REF.finditer(path.read_text()):
            used.add(match.group(1))

    errors = False

    # Broken references (used but not defined)
    broken = used - defined
    if broken:
        for name in sorted(broken):
            print(f"::error::Broken snippet reference: '{name}'")
        errors = True

    # Dead snippets (defined but not used)
    dead = defined - used
    if dead:
        for name in sorted(dead):
            print(f"::warning::Dead snippet (defined but never referenced): '{name}'")

    sys.exit(1 if errors else 0)

if __name__ == "__main__":
    main()

Run this in CI alongside your tests. Broken references fail the build. Dead snippets generate warnings.

Prose descriptions — no tool catches “we call send_event()” when you renamed it to publish_event(). The best defence is to reference behaviour, not implementation details:

<!-- BAD — breaks on rename -->
The `send_event()` function handles serialization and delivery.

<!-- BETTER — describes behaviour -->
The event publishing function handles serialization and delivery.

<!-- BEST — reference the snippet directly -->
The function shown in @lst-kafka-send handles serialization and delivery.

Pitfall 3: Cross-snippet dependencies are invisible

Chapter 5’s snippet uses a producer variable that was created in Chapter 1’s snippet. The extraction tool doesn’t know this — it just pulls text. If you reorder or remove the Chapter 1 snippet, Chapter 5’s code still extracts fine but won’t make sense to the reader.

Solution: make dependencies explicit in tests.

# tests/test_ch05.py

from code.ch01.kafka_setup import create_producer
from code.ch05.consumer import process_events

def test_end_to_end():
    """Validates that ch05 examples work with ch01's infrastructure setup."""
    producer = create_producer()
    # Test ch05 snippets with ch01's setup
    result = process_events(producer, topic="test-events")
    assert result.processed_count > 0

The import chain makes the dependency graph explicit. If Chapter 1 refactors and breaks the interface, Chapter 5’s test fails with a clear import error — not a mysterious runtime problem weeks later when a reader tries the code.

For additional safety, declare dependencies in the snippet markers themselves:

# <<< snippet: consume-events depends: kafka-producer-setup >>>

Your linter can parse these and validate that every dependency is included earlier in the chapter ordering.

Pitfall 4: Dead snippets accumulate

The linter script above catches this — defined but unreferenced snippets generate warnings. Run it in CI so they surface on every PR.

Go further: add a pre-commit hook that runs the linter on staged .qmd and source files. Catch dead snippets before they’re committed, not in CI.

Pitfall 5: Context loss in extracted snippets

The reader sees 8 lines pulled from a 200-line file. Where do the imports come from? What class is this method in?

Solution: snippet groups with optional context.

# <<< snippet-context: kafka-imports >>>
from kafka import KafkaProducer
import json
# <<< /snippet-context: kafka-imports >>>

# <<< snippet: kafka-producer-setup context: kafka-imports >>>
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# <<< /snippet: kafka-producer-setup >>>

Your extraction script can render the context as a collapsible block above the snippet:

::: {.callout-note collapse="true" title="Full imports for this example"}
```python
from kafka import KafkaProducer
import json
```
:::

```{python}
#| eval: false
{{< include ../code/ch01/kafka_setup.py#kafka-producer-setup >}}
```

Readers who need the full picture expand the callout. Readers who don’t skip past it. The context is always accurate because it’s extracted from the same source file.

Diagrams in Multi-Format Books

Technical books need diagrams. Books also need those diagrams to work across PDF (vector, static), HTML (potentially interactive), and EPUB (static, constrained). Here’s how to handle each diagram tool.

Mermaid — your default choice

Quarto renders Mermaid natively. Write it inline in the chapter:

```{mermaid}
%%| label: fig-data-flow
%%| fig-cap: "Event processing pipeline"
flowchart LR
    A[Producer] --> B[Kafka]
    B --> C[Stream Processor]
    C --> D[(Database)]
    C --> E[Monitoring]
```

This gives you:

  • PDF: auto-rendered to SVG, embedded as a vector image
  • HTML: rendered client-side, interactive (hover, zoom)
  • EPUB: rendered to static SVG
  • Version control: the diagram source is text in the .qmd file, clean diffs

Use Mermaid for flowcharts, sequence diagrams, ERDs, state diagrams, Gantt charts, and decision trees. It handles 80% of technical book diagrams.

Tip: keep diagrams under 12 nodes. If a diagram is getting complex, split it into two diagrams with a connecting narrative paragraph.

Draw.io — for complex architecture diagrams

When Mermaid can’t handle the layout — 15+ nodes, custom positioning, overlapping layers, network topology — use Draw.io. The .drawio XML format is version-controllable (diffable, mergeable).

Store sources and exports together:

diagrams/
  ch03-platform-architecture.drawio      # Source
  ch03-platform-architecture.svg         # Exported for book
  ch07-data-flow.drawio
  ch07-data-flow.svg

Reference in the chapter:

![Platform Architecture](../diagrams/ch03-platform-architecture.svg){#fig-platform}

Automate the export in CI so you never forget to re-export after editing:

#!/bin/bash
# scripts/export-diagrams.sh

for f in diagrams/*.drawio; do
    svg="${f%.drawio}.svg"
    drawio --export --format svg --border 10 --output "$svg" "$f"
done

The Draw.io CLI (drawio or draw.io) works headless on Linux CI runners.

Excalidraw — for hand-drawn style diagrams

Excalidraw produces a distinctive hand-drawn aesthetic that works well for conceptual diagrams, whiteboard-style explanations, and informal system overviews. The .excalidraw source is JSON — version-controllable.

The workflow is the same as Draw.io: store source + exported SVG, automate export in CI. Excalidraw’s CLI export is less mature than Draw.io’s, so you may need to export manually or use the excalidraw-export tool.

D2 — for layout-critical diagrams

D2 is a text-based diagramming language with the TALA layout engine, which produces better automatic layouts than Mermaid on complex graphs. The source is a .d2 text file.

Producer -> Kafka: events
Kafka -> "Stream\nProcessor": consume
"Stream\nProcessor" -> Database: write
"Stream\nProcessor" -> Monitoring: metrics

Export to SVG:

d2 --theme=0 diagram.d2 diagram.svg

Use D2 when you have complex 15+ node diagrams where automatic layout quality matters and Mermaid’s output looks messy.

PlantUML — for UML compliance

If your book needs formal UML diagrams (class diagrams with visibility modifiers, sequence diagrams with activation bars and alt/else fragments, component diagrams), PlantUML is the right tool. Quarto has a PlantUML filter available.

Choosing the right tool per diagram

flowchart TD
    A[Need a diagram] --> B{How complex?}
    B -->|Under 12 nodes| C{Need UML<br/>compliance?}
    B -->|12+ nodes| D{Need custom<br/>positioning?}
    C -->|No| E[Mermaid<br/>inline in .qmd]
    C -->|Yes| F[PlantUML]
    D -->|Yes| G[Draw.io or<br/>Excalidraw]
    D -->|No| H[D2<br/>text-based]

For most technical books: Mermaid for 80% of diagrams, Draw.io for the complex architecture diagrams, and PlantUML only if you need formal UML.

Theme handling across output formats

PDF books are printed — light theme only. HTML versions may support dark mode. EPUB readers vary.

The pragmatic approach:

  • Mermaid: Quarto handles theming per output format automatically
  • Draw.io / Excalidraw / D2: export with light theme, set a white background on the SVG

Don’t generate two versions of every diagram unless your HTML version genuinely needs dark mode. For most technical books, light-theme diagrams on a white or transparent background work everywhere.

The CI Pipeline

Everything comes together in a GitHub Action:

name: Build Book

on:
  push:
    branches: [main]
  pull_request:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Lint snippet references
        run: python scripts/lint_snippets.py

      - name: Export Draw.io diagrams
        run: |
          # Install Draw.io CLI
          scripts/export-diagrams.sh

      - name: Start infrastructure
        run: docker compose up -d
        working-directory: code

      - name: Run code tests
        run: pytest code/tests/ -v

      - name: Install Quarto
        uses: quarto-dev/quarto-actions/setup@v2

      - name: Render book
        run: quarto render

      - name: Stop infrastructure
        if: always()
        run: docker compose down
        working-directory: code

      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: book
          path: |
            _book/*.pdf
            _book/*.epub

The pipeline:

  1. Lint snippets — catch broken references and dead snippets before anything else
  2. Export diagrams — regenerate SVGs from Draw.io sources
  3. Start infrastructure — Docker Compose brings up Kafka, Postgres, etc.
  4. Run code tests — validate all snippets against real infrastructure
  5. Render book — Quarto executes inline code (connecting to the running infrastructure) and resolves snippet includes
  6. Upload artifacts — PDF and EPUB available as build artifacts
flowchart LR
    A[Push / PR] --> B[Lint snippets]
    B --> C[Export diagrams]
    C --> D[Start Docker infra]
    D --> E[Run code tests]
    E --> F[quarto render]
    F --> G[PDF + HTML + EPUB]
    D --> H[Stop infra]
    F --> H

    I[Renovate PR] --> A

When Renovate or Dependabot bumps a dependency, this entire pipeline runs. If the upgrade breaks a code example — whether it’s an inline Quarto block or an extracted snippet — the PR fails before merge.

Putting It All Together

The complete workflow for a technical book with infrastructure-dependent code and rich diagrams:

  1. Write chapters in .qmd files with Mermaid diagrams inline and snippet includes for infrastructure code
  2. Maintain tested code in code/ with snippet markers and CI tests
  3. Create complex diagrams in Draw.io, store .drawio sources in diagrams/, auto-export SVGs in CI
  4. Pre-render script starts Docker infrastructure so inline code can execute against real services
  5. CI validates everything: snippet linting, code tests, diagram export, full book render
  6. Dependency upgrades trigger full validation — broken examples block the PR

The result is a book where every code example is tested, every diagram is version-controlled, and a library upgrade that breaks something is caught automatically before it reaches readers.

Further Reading