When people talk about AI agents getting better, the conversation usually focuses on the model — GPT-4.5, Gemini 2.5, Claude 4.7. But a smaller model wrapped in a well-designed harness can outperform a larger model with none.
This is the concept of Agent Harness: the infrastructure and rules that surround an AI model, governing how it uses tools, manages context, enforces permissions, and handles loops. It’s the layer that turns a text generator into a reliable autonomous agent.
After watching a thorough breakdown by mikelopster — the same creator whose Deep Agent deep dive I covered previously — I wanted to distill the key concepts here, since they directly apply to how I’m building my own agentic systems.
The Core Equation
Agent = Model + Harness
The model is a text probability engine. The harness is everything else: the tool definitions, the context window management, the error handling, the permission gates, the lifecycle hooks. A good harness makes a mediocre model punch above its weight. A bad harness makes a great model unreliable.
The 9 Components of an Agent Harness
The video breaks down a harness into 9 essential components. These aren’t a strict standard — they’re the recurring patterns that appear across every major agent implementation:
1. Loop Condition Management
Agents will fail. The harness must decide: how many retries? What counts as terminal error? When does it escalate to the user? Without this, agents either loop forever or give up too early.
2. Context Management
Context windows fill up. The harness needs strategies for compression (summarizing older turns), sliding windows, or moving intermediate state to files. Claude Code’s “auto-compact” feature is a good example — it transparently summarizes earlier parts of the conversation to keep the active context fresh.
3. Skill & Tool Management
What tools are available? How does the agent discover them? Some tools are built-in (read file, write file, run command), others are custom skills loaded on demand. The harness defines the inventory and the discovery mechanism.
4. Tools Layer
At the lowest level, tools are deterministic operations: read a file, search code, run a shell command, fetch a URL. Each tool must have a clear contract — inputs, outputs, side effects. The harness enforces that the agent stays within these contracts.
5. Sub-Agent Management
When a task grows complex, the harness may spawn child agents for isolated subtasks. This is the scaling mechanism — instead of stuffing everything into one context, you delegate. Each sub-agent gets its own context, tool set, and scope.
6. Built-in Skills
Beyond basic tools, some harnesses ship specialized capabilities: language server integration (error checking, go-to-definition), edit formatting, test running. These are higher-level operations built on top of the basic tool layer.
7. Session Persistence
Can you resume an agent session after closing it? This requires storing session state — conversation history, file modifications, checkpoint snapshots. Claude Code’s session management (with /revert and session branching) is one of the cleanest implementations.
8. Lifecycle Hooks (Pre/Post Hooks)
Before and after every tool call, the harness fires hooks. Pre-hooks can validate inputs, check permissions, or log intent. Post-hooks can verify results, trigger tests, or roll back on failure. This is where custom logic plugs into the agent pipeline.
9. Permission & Safety
The harness defines boundaries. Default: the agent operates within a workspace and can’t touch files outside it without explicit approval. Operations like git push, network access, or deleting files may require escalation. The permission model is what makes an agent safe to run unattended.
Three Core Pillars
The video distills these 9 components into 3 pillars:
| Pillar | What it covers |
|---|---|
| Condition Logic | Loops, retries, error handling, hooks, tool routing |
| Scaling | Context management, sub-agent spawning, compression |
| Persistence + Safety | Session state, permissions, workspace isolation |
Building Your Own Harness: Three Levels
The video maps out three levels of harness customization:
Level 1 — Use an existing agent product
Tools like Claude Code, Cursor, or Codex come with a built-in harness. You control it through prompts, CLAUDE.md files, and skills. You can’t modify the harness internals, but you can guide the agent’s behavior at the orchestration layer.
Level 2 — Use an Agent SDK Frameworks like OpenAI Agents SDK, Deep Agent, or Claude Agent SDK give you a pre-built harness with customization points. You configure tools, hooks, and permissions without writing the infrastructure from scratch.
Level 3 — Build from scratch Using LangGraph, LangChain, or Vercel AI SDK, you construct your own harness. This gives full control but requires implementing all 9 components yourself. The video recommends starting at Level 1 or 2 and only going to Level 3 when you’ve outgrown the existing options.
Case Study: Codex
Codex’s harness operates on three levels:
- Session — A persistent thread between user and agent, spanning multiple turns with history and resume capability
- Turn — One user input → agent response cycle. Each turn starts with a user request and ends with an output
- Item — Individual tool invocations within a turn (read file, edit code, run command)
The flow: user input → JSON tool routing → agent processes tools → tools dispatched via threads → results aggregated → response delivered. All Codex surfaces (desktop, TUI, CLI, web) share the same harness server internally.
Case Study: Claude Code
Claude Code’s harness is built around a simple loop:
Get Context → Take Action → Verify Results → (repeat)
- Get Context: Read files, search codebase, gather relevant information
- Take Action: Edit files, run commands, write new code
- Verify Results: Run tests, check output, confirm the fix works
If verification fails, the loop repeats — the agent gathers new context and tries again. This loop is the harness’s core condition logic.
Key features: workspace isolation (files outside the workspace require approval), session branching (revert to any previous checkpoint), auto-compact (context compression), and LSP integration for real-time error detection.
Why This Matters
Understanding Agent Harness changed how I think about building agents. My doppelganger agent has its own small harness — memory files, tool definitions, permission rules — but watching this breakdown showed me how much more structured it could be.
The most practical takeaway: start at Level 1 or Level 2. Don’t build your own harness until you understand what the existing ones already handle. Claude Code’s harness, controlled through CLAUDE.md, skills, and MCP servers, covers most of the 9 components out of the box. The same goes for Deep Agent (covered in my previous post) which wraps LangChain + LangGraph with a pre-built harness layer.
If you’re building agentic systems, I’d recommend watching the full video (Thai language with visual slides) — the framework it provides for thinking about harness design is invaluable regardless of which tools you use.