There’s an open secret in the world of DevOps: Nobody trusts the CMDB. The Configuration Management Database (CMDB) is supposed to be the “source of truth”—the central map of every server, service, and application in your enterprise. In theory, it’s the foundation for security audits, cost analysis, and incident response. In practice, it’s a work of fiction. The moment you populate a CMDB, it begins to rot. Engineers deploy a new microservice but forget to register it. An autoscaling group spins up 20 new nodes, but the database only records the original three. . .
We call this configuration drift, and for decades, our industry’s solution has been to throw more scripts at the problem. We write massive, brittle ETL (Extract-Transform-Load) pipelines that attempt to scrape the world and shove it into a relational database. It never works. The “world”—especially the modern cloud native world—moves too fast.
We realized we couldn’t solve this problem by writing better scripts. We had to change the fundamental architecture of how we sync data. We stopped trying to boil the ocean and fix the entire enterprise at once. Instead, we focused on one notoriously difficult environment: Kubernetes. If we could build an autonomous agent capable of reasoning about the complex, ephemeral state of a Kubernetes cluster, we could prove a pattern that works everywhere else. This article explores how we used the newly open-sourced Codex CLI and theModel Context Protocol (MCP) to build that agent. In the process, we moved from passive code generation to active infrastructure operation, transforming the “stale CMDB” problem from a data entry task into a logic puzzle.
The Shift: From Code Generation to Infrastructure Operation with Codex CLI and MCP
The reason most CMDB initiatives fail is ambition. They try to track every switch port, virtual machine, and SaaS license simultaneously. The result is a data swamp—too much noise, not enough signal. We took a different approach. We drew a small circle around a specific domain: Kubernetes workloads. Kubernetes is the perfect testing ground for AI agents because it’s high-velocity and declarative. Things change constantly. Pods die; deployments roll over; services change selectors. A static script struggles to distinguish between a CrashLoopBackOff (a temporary error state) and a purposeful scale-down. We hypothesized that a large language model (LLM), acting as an operator, could understand this nuance. It wouldn’t just copy data; it would interpret it.
The Codex CLI turned this hypothesis into a tangible architecture by enabling a shift from “code generation” to “infrastructure operation.” Instead of treating the LLM as a junior programmer that writes scripts for humans to review and run, Codex empowers the model to execute code itself. We provide it with tools—executable functions that act as its hands and eyes—via the Model Context Protocol. MCP defines a clear interface between the AI model and the outside world, allowing us to expose high-level capabilities like cmdb_stage_transaction without teaching the model the complex internal API of our CMDB. The model learns to use the tool, not the underlying API.
The architecture of agency
Our system, which we call k8s-agent, consists of three distinct layers. This isn’t a single script running top to bottom; it’s a cognitive architecture.
The cognitive layer (Codex + contextual instructions): This is the Codex CLI running a specific system prompt. We don’t fine-tune the model weights. Infrastructure moves too fast for fine-tuning: A model trained on Kubernetes v1.25 would be hallucinating by v1.30. Instead, we use context engineering—the art of designing the environment in which the AI operates. This involves tool design (creating atomic, deterministic functions), prompt architecture (structuring the system prompt), and information architecture (deciding what information to hide or expose). We feed the model a persistent context file (AGENTS.md) that defines its persona: “You are a meticulous infrastructure auditor. Your goal is to ensure the CMDB accurately reflects the state of the Kubernetes cluster. You must prioritize safety: Do not delete records unless you have positive confirmation; they are orphans.”
The tool layer: Using MCP, we expose deterministic Python functions to the agent.
- Sensors: k8s_list_workloads, cmdb_query_service, k8s_get_deployment_spec
- Actuators: cmdb_stage_create, cmdb_stage_update, cmdb_stage_delete
Note that we track workloads (Deployments, StatefulSets), not Pods. Pods are ephemeral; tracking them in a CMDB is an antipattern that creates noise. The agent understands this distinction—a semantic rule that is hard to enforce in a rigid script.
The state layer (the safety net): LLMs are probabilistic; infrastructure must be deterministic. We bridge this gap with a staging pattern. The agent never writes directly to the production database. It writes to a staged diff. This allows a human (or a policy engine) to review the proposed changes before they are committed.
The OODA Loop in Action
How does this differ from a standard sync script? A script follows a linear path: Connect → Fetch → Write. If any step fails or returns unexpected data, the script crashes or corrupts data. Our agent follows the Observe-Orient-Decide-Act (OODA) loop, popularized by military strategists. Unlike a linear script that executes blindly, the OODA loop forces the agent to pause and synthesize information before taking action. This cycle allows it to handle incomplete data, verify assumptions, and adapt to changing conditions—traits essential for operating in a distributed system.
Let’s walk through a real scenario we encountered during our pilot, the Ghost Deployment, to explore the benefits of using an OODA loop. A developer had deleted a deployment named payment-processor-v1 from the cluster but forgot to remove the record from the CMDB. A standard script might pull the list of deployments, see payment-processor-v1 is missing, and immediately issue a DELETE to the database. The risk is obvious: What if the API server was just timing out? What if the script had a bug in its pagination logic? The script blindly destroys data based on the absence of evidence.
The agent approach is fundamentally different. First, it observes: Calling k8s_list_workloads and cmdb_query_service, noticing the discrepancy. Second, it orients: Checking its context instructions to “verify orphans before deletion” and deciding to call k8s_get_event_history. Third, it decides: Seeing a “delete” event in the logs, it reasons that the resource is missing and that there’s been a deletion event. Finally, it acts: Calling cmdb_stage_delete with a comment confirming the deletion. The agent didn’t just sync data; it investigated. It handled the ambiguity that usually breaks automation.
Solving the “Semantic Gap”
This specific Kubernetes use case highlights a broader problem in IT operations: the “semantic gap.” The data in our infrastructure (JSON, YAML, logs) is full of implicit meaning. A label “env: production” changes the criticality of a resource. A status CrashLoopBackOff means “broken,” but Completed means “finished successfully.” Traditional scripts require us to hardcode every permutation of this logic, resulting in thousands of lines of unmaintainable if/else statements. With the Codex CLI, we replace those thousands of lines of code with a few sentences of English in the system prompt: “Ignore jobs that have completed successfully. Sync failing Jobs so we can track instability.” The LLM bridges the semantic gap. It understands what “instability” implies in the context of a job status. We’re describing our intent, and the agent is handling the implementation.
Scaling Beyond Kubernetes
We started with Kubernetes because it’s the “hard mode” of configuration management. In a production environment with thousands of workloads, things change constantly. A standard script sees a snapshot and often gets it wrong. An agent, however, can work through the complexity. It might run its OODA loop multiple times to solve a single issue—by checking logs, verifying dependencies, and confirming rules before it ever makes a change. This ability to connect reasoning steps allows it to handle the scale and uncertainty that breaks traditional automation.
But the pattern we established, agentic OODA Loops via MCP, is universal. Once we proved the model worked for Pods and Services, we realized we could extend it. For legacy infrastructure, we can give the agent tools to SSH into Linux VMs. For SaaS management, we can give it access to Salesforce or GitHub APIs. For cloud governance, we can ask it to audit AWS Security Groups. The beauty of this architecture is that the “brain” (the Codex CLI) stays the same. To support a new environment, we don’t need to rewrite the engine; we just hand it a new set of tools.However, shifting to an agentic model forces us to confront new trade-offs. The most immediate is cost versus context. We learned the hard way that you shouldn’t give the AI the raw YAML of a Kubernetes deployment—it consumes too many tokens and distracts the model with irrelevant details. Instead, you create a tool that returns a digest—a simplified JSON object with only the fields that matter. This is context optimization, and it is the key to running agents cost-effectively.
Conclusion: The Human in the Cockpit
There’s a fear that AI will replace the DevOps engineer. Our experience with the Codex CLI suggests the opposite. This technology does not remove the human; it elevates them. It promotes the engineer from a “script writer” to a “mission commander.” The stale CMDB was never really a data problem; it was a labor problem. It was simply too much work for humans to manually track and too complex for simple scripts to automate. By introducing an agent that can reason, we finally have a mechanism capable of keeping up with the cloud.
We started with a small Kubernetes cluster. But the destination is an infrastructure that is self-documenting, self-healing, and fundamentally intelligible. The era of the brittle sync script is over. The era of infrastructure as intent has begun!

AI & ML, Operations, Commentary
Radar