Platform EngineeringAgentic AIPython

Why agentic SDLC needs a schema first approach

Convention-driven PRD-to-HLD-to-LLD pipelines and why schema-gen should be your first commit.

28 March 2026 · 8 min read

Most teams adopting AI-assisted development start by throwing prompts at code generation. No doubt there is an excitement to watch an LLM scaffold an entire service in minutes. But three sprints later, the codebase is a graveyard of inconsistent models, orphaned endpoints, and schema drift that no agent can reconcile.

The problem with prompt-first development

When you ask an AI agent to “build an order service,” it makes assumptions. It invents field names, guesses relationships, and creates a schema that works for the immediate ask but conflicts with everything else in your system.

Multiply this across a team of developers, each with their own agent sessions, and you get:

  • Three different representations of an Order entity
  • Inconsistent naming conventions (order_id vs orderId vs id)
  • Missing foreign keys because the agent didn’t know about related services
  • Type mismatches at API boundaries that only surface in integration testing

Solution: Schema-gen as the first commit

The agentic SDLC framework I’ve been building uses schema-gen as the foundational step — a universal schema converter that lets you define your data model once in Python and generate consistent models across 12+ targets: Pydantic, SQLAlchemy, TypeScript/Zod, Jackson, Kotlin data classes, JSON Schema, GraphQL, Protocol Buffers, and Apache Avro.

Before any code generation happens, the data model is defined using the @Schema decorator with field-level constraints:

from schema_gen import Schema, Field
from decimal import Decimal
from datetime import datetime
from typing import Literal, Optional

@Schema
class OrderEvent:
    order_id: str = Field(..., description="Unique order identifier")
    product_sku: str = Field(..., description="Product SKU", index=True)
    action: Literal["purchase", "return"]
    quantity: Decimal = Field(..., min_value=1)
    unit_price: Decimal = Field(..., min_value=0)
    timestamp: datetime
    metadata: Optional[dict] = None

This schema becomes the spine. Every subsequent layer — the HLD document, the ticket context manifests, the generated API handlers, the test fixtures — inherits from it rather than inventing its own structure.

What makes this powerful for agentic workflows is schema variants — different views of the same model generated from a single definition. An OrderEvent can produce a create_request variant (excluding auto-generated fields like order_id and timestamp), an update_request variant (only mutable fields), a public_response variant (omitting internal metadata), and a full_response for admin views. The agent never needs to guess which fields are required for which operation — the variants make it explicit.

The workflow is straightforward: schema-gen init scaffolds a project, you define schemas in the schemas/ directory, and schema-gen generate produces models for all configured targets. During development, schema-gen watch auto-regenerates on file changes. A schema-gen validate command verifies that generated code stays in sync with source schemas — useful as a CI gate.

Convention over configuration

The pipeline follows a strict progression:

  1. PRD — defines what the system does in business terms
  2. Schema-gen — translates the PRD entities into typed Pydantic models
  3. HLD — references the schema to define service boundaries and API contracts
  4. LLD — generates implementation details constrained by the schema
  5. Code generation — agents operate within the schema’s type system

Each layer includes a context manifest — a structured reference back to the decisions made in previous layers. When an agent generates code at the LLD level, it has the full chain of reasoning from PRD to schema to HLD available in its context.

Human checkpoints

Between each execution layer, there’s a mandatory human review. The agent proposes, the human validates. This isn’t about distrust — it’s about catching the compound errors that accumulate when agents make reasonable-but-wrong assumptions across layers.

The checkpoint between schema-gen and HLD is the most critical. Get the data model right, and everything downstream flows naturally. Get it wrong, and you’re refactoring the spine while the whole body is in motion.

What this looks like in practice

I demonstrated this with a Todo app as a reference implementation — deliberately simple so the pipeline mechanics are visible without domain complexity getting in the way.

The schema defined TodoItem, TodoList, and User models. The HLD split these across two services (task service, auth service) with clearly typed API contracts. The LLD generated FastAPI handlers that import the schema directly — no re-definition, no drift.

The result: any agent, at any layer, produces code that’s type-compatible with every other layer. That’s the power of the spine.

Applying this to a real platform

A production e-commerce platform’s architecture — order management, inventory, fulfilment, payments — is significantly more complex than a Todo app. But the principle scales. The schema defines Order, Product, Inventory, Payment, and Shipment entities once, and schema-gen produces Pydantic models for the Python services, TypeScript/Zod schemas for the storefront, Avro schemas for Kafka events, and SQLAlchemy models for persistence — all from the same source definitions.

When I extend the system — say, adding a returns service — the agent receives the full schema context and generates code that slots into the existing type system without friction. The schema-gen validate step in CI catches any drift before it reaches production.

The data model spine isn’t just an architectural pattern. It’s the difference between AI-assisted development that compounds value and AI-assisted development that compounds technical debt. The schema-gen source and examples are available on GitHub.