Back to blog
AI Trending May 31, 2026 · 6 min read

Agent Harness — What It Is and Why It's the Most Important Layer in AI Agent Development

A deep dive into Agent Harness — the control infrastructure around AI models that governs tools, context, permissions, and safety. With case studies from Codex and Claude Code.

#agent harness #AI agents #Claude Code #Codex #tooling #architecture #safety

When people talk about AI agents getting better, the conversation usually focuses on the model — GPT-4.5, Gemini 2.5, Claude 4.7. But a smaller model wrapped in a well-designed harness can outperform a larger model with none.

This is the concept of Agent Harness: the infrastructure and rules that surround an AI model, governing how it uses tools, manages context, enforces permissions, and handles loops. It’s the layer that turns a text generator into a reliable autonomous agent.

After watching a thorough breakdown by mikelopster — the same creator whose Deep Agent deep dive I covered previously — I wanted to distill the key concepts here, since they directly apply to how I’m building my own agentic systems.

The Core Equation

Agent = Model + Harness

The model is a text probability engine. The harness is everything else: the tool definitions, the context window management, the error handling, the permission gates, the lifecycle hooks. A good harness makes a mediocre model punch above its weight. A bad harness makes a great model unreliable.

The 9 Components of an Agent Harness

The video breaks down a harness into 9 essential components. These aren’t a strict standard — they’re the recurring patterns that appear across every major agent implementation:

1. Loop Condition Management

Agents will fail. The harness must decide: how many retries? What counts as terminal error? When does it escalate to the user? Without this, agents either loop forever or give up too early.

2. Context Management

Context windows fill up. The harness needs strategies for compression (summarizing older turns), sliding windows, or moving intermediate state to files. Claude Code’s “auto-compact” feature is a good example — it transparently summarizes earlier parts of the conversation to keep the active context fresh.

3. Skill & Tool Management

What tools are available? How does the agent discover them? Some tools are built-in (read file, write file, run command), others are custom skills loaded on demand. The harness defines the inventory and the discovery mechanism.

4. Tools Layer

At the lowest level, tools are deterministic operations: read a file, search code, run a shell command, fetch a URL. Each tool must have a clear contract — inputs, outputs, side effects. The harness enforces that the agent stays within these contracts.

5. Sub-Agent Management

When a task grows complex, the harness may spawn child agents for isolated subtasks. This is the scaling mechanism — instead of stuffing everything into one context, you delegate. Each sub-agent gets its own context, tool set, and scope.

6. Built-in Skills

Beyond basic tools, some harnesses ship specialized capabilities: language server integration (error checking, go-to-definition), edit formatting, test running. These are higher-level operations built on top of the basic tool layer.

7. Session Persistence

Can you resume an agent session after closing it? This requires storing session state — conversation history, file modifications, checkpoint snapshots. Claude Code’s session management (with /revert and session branching) is one of the cleanest implementations.

8. Lifecycle Hooks (Pre/Post Hooks)

Before and after every tool call, the harness fires hooks. Pre-hooks can validate inputs, check permissions, or log intent. Post-hooks can verify results, trigger tests, or roll back on failure. This is where custom logic plugs into the agent pipeline.

9. Permission & Safety

The harness defines boundaries. Default: the agent operates within a workspace and can’t touch files outside it without explicit approval. Operations like git push, network access, or deleting files may require escalation. The permission model is what makes an agent safe to run unattended.

Three Core Pillars

The video distills these 9 components into 3 pillars:

PillarWhat it covers
Condition LogicLoops, retries, error handling, hooks, tool routing
ScalingContext management, sub-agent spawning, compression
Persistence + SafetySession state, permissions, workspace isolation

Building Your Own Harness: Three Levels

The video maps out three levels of harness customization:

Level 1 — Use an existing agent product Tools like Claude Code, Cursor, or Codex come with a built-in harness. You control it through prompts, CLAUDE.md files, and skills. You can’t modify the harness internals, but you can guide the agent’s behavior at the orchestration layer.

Level 2 — Use an Agent SDK Frameworks like OpenAI Agents SDK, Deep Agent, or Claude Agent SDK give you a pre-built harness with customization points. You configure tools, hooks, and permissions without writing the infrastructure from scratch.

Level 3 — Build from scratch Using LangGraph, LangChain, or Vercel AI SDK, you construct your own harness. This gives full control but requires implementing all 9 components yourself. The video recommends starting at Level 1 or 2 and only going to Level 3 when you’ve outgrown the existing options.

Case Study: Codex

Codex’s harness operates on three levels:

  1. Session — A persistent thread between user and agent, spanning multiple turns with history and resume capability
  2. Turn — One user input → agent response cycle. Each turn starts with a user request and ends with an output
  3. Item — Individual tool invocations within a turn (read file, edit code, run command)

The flow: user input → JSON tool routing → agent processes tools → tools dispatched via threads → results aggregated → response delivered. All Codex surfaces (desktop, TUI, CLI, web) share the same harness server internally.

Case Study: Claude Code

Claude Code’s harness is built around a simple loop:

Get Context → Take Action → Verify Results → (repeat)

If verification fails, the loop repeats — the agent gathers new context and tries again. This loop is the harness’s core condition logic.

Key features: workspace isolation (files outside the workspace require approval), session branching (revert to any previous checkpoint), auto-compact (context compression), and LSP integration for real-time error detection.

Why This Matters

Understanding Agent Harness changed how I think about building agents. My doppelganger agent has its own small harness — memory files, tool definitions, permission rules — but watching this breakdown showed me how much more structured it could be.

The most practical takeaway: start at Level 1 or Level 2. Don’t build your own harness until you understand what the existing ones already handle. Claude Code’s harness, controlled through CLAUDE.md, skills, and MCP servers, covers most of the 9 components out of the box. The same goes for Deep Agent (covered in my previous post) which wraps LangChain + LangGraph with a pre-built harness layer.

If you’re building agentic systems, I’d recommend watching the full video (Thai language with visual slides) — the framework it provides for thinking about harness design is invaluable regardless of which tools you use.


← All posts dot8pixels.dev