MCP servers in production — lessons from the trenches
What happens when your agentic AI framework meets real-world Terraform pipelines and enterprise auth.
The Model Context Protocol has gone from an interesting concept to a production concern faster than most of us expected. I’ve been running MCP servers alongside Terraform pipelines and LangChain agents for the past few months. Here’s what I’ve learned about the gap between the demo and reality.
The promise vs the practice
MCP promises a standard protocol for connecting AI models to external tools and data sources. In the demo, it looks magical — an LLM seamlessly querying databases, executing workflows, and managing infrastructure.
In production, you discover:
- Authentication flows are more complex than the examples suggest
- Token refresh cycles interact badly with long-running agent sessions
- Error handling needs to be defensive at every protocol boundary
- Observability is an afterthought in most MCP implementations
Auth: the first wall you hit
Most enterprise environments use Entra ID (formerly Azure AD) with OIDC federation. Getting an MCP server to authenticate through this chain requires careful setup:
# Simplified — real implementation needs token caching and refresh
from authlib.integrations.httpx_client import AsyncOAuth2Client
async def get_mcp_token(config: AuthConfig) -> str:
client = AsyncOAuth2Client(
client_id=config.client_id,
client_secret=config.client_secret,
)
token = await client.fetch_token(
url=config.token_endpoint,
grant_type="client_credentials",
scope=config.scope,
)
return token["access_token"]
The challenge isn’t the code — it’s the lifecycle. MCP server connections are long-lived. Tokens expire. The agent is mid-workflow when the token dies. You need graceful re-auth that doesn’t lose the agent’s state.
Terraform integration: the happy path and the traps
Using MCP to let an AI agent manage Terraform plans sounds powerful. And it is — when it works. The agent reads the current state, proposes a change, generates a plan, and presents it for human approval.
The traps:
- State locking: Terraform locks state during operations. If the agent’s MCP connection drops mid-plan, you get a stuck lock that requires manual intervention.
- Plan drift: The time between “agent generates plan” and “human approves” can be long enough for the underlying infrastructure to change. The plan is stale before it’s applied.
- Secret management: The agent needs access to provider credentials but should never see them directly. MCP tool definitions need to abstract over secret injection.
Observability: what you can’t see will hurt you
The biggest gap in the current MCP ecosystem is observability. When an agent makes 15 tool calls across 3 MCP servers to accomplish a task, you need to know:
- Which calls succeeded and which failed?
- What was the latency of each call?
- Did the agent retry failed calls?
- What was the total token consumption?
- Was the result correct?
I’ve been wiring OpenTelemetry traces through the MCP layer. Each tool call becomes a span with structured attributes — tool name, parameters (redacted), response status, latency. This feeds into the same Grafana dashboards that monitor the rest of the infrastructure.
The human checkpoint pattern
Every MCP-driven action that modifies state goes through a human checkpoint. The pattern:
- Agent proposes an action via MCP tool call
- The action is queued, not executed
- Human reviews the proposed action with full context
- Human approves or rejects
- If approved, the action executes with the original parameters
This adds latency but eliminates the “the AI deleted my production database” class of incidents. For read-only operations, the agent acts autonomously. For writes, the human is always in the loop.
What I’d do differently
If I were starting today:
- Build the observability layer first, before any MCP tools. You can’t debug what you can’t see.
- Design for token refresh from day one. Don’t bolt it on after your first mysterious auth failure at 3am.
- Keep MCP tool definitions narrow. A tool that “manages infrastructure” is too broad. A tool that “lists EC2 instances in a specific VPC” is right-sized.
- Test failure modes explicitly. What happens when the MCP server is unreachable? What happens when it returns malformed JSON? What happens when the tool call times out?
MCP is a solid protocol. But like any protocol, the value is in the implementation. The gap between a working demo and a production system is filled with auth flows, error handling, observability, and the hard-won understanding that AI agents need the same operational discipline as any other production service.