Nick Hauenstein

I don't think that word means what you think it means

Despite the date on the post, I intend to update regularly. So don’t be surprised to see anachronistic references

In an evolving space it is common for similar concepts to receive multiple labels, or worse – the same label being applied to multiple conflicting concepts. To avoid confusion, I’m including a glossary of terms that I’m using across this blog so that it is clear what my intention is for those terms. For some (e.g., agent operating model) I’ve coined a new term to cover a concept for which the industry does not yet appear to have converged on a term.

What is an Agent Harness

  • Agent harness - An agent harness (also called a scaffold) is the complete system that wraps around an LLM to turn it into a functioning agent. The model alone generates text; the harness makes it do things.

    Note: “Harness” in this context does not mean test harness. Here, harness (also called scaffold) means the system that wraps around a bare model to turn it into a functioning agent.

What does a harness typically include?

A harness typically includes:

Component Purpose
System prompts Instructions shaping agent behavior, personality, constraints
Tool definitions The tools the agent can call (bash, file edit, browser, APIs), including their schemas and descriptions
Orchestration logic The loop that calls the model, parses tool calls, executes them, feeds results back
Context management Compaction (summarizing to reclaim token space), structured note-taking, context resets between windows
Artifact management Progress files, feature lists, git commits that persist state across sessions
Guard rails Constraints that prevent the agent from taking unauthorized or destructive actions
Lifecycle hooks Deterministic scripts that run at agent lifecycle events (pre/post tool use, session start/stop), enforce validation, approvals, and audit trails with guaranteed outcomes

A related but distinct concept is the eval harness – infrastructure that measures agent performance (graders, test suites, benchmark runners). The key is that an agent harness turns a model into an agent. The purpose of an eval harness on the other hand is to measure how well an agent performs.

Harness vs. Workflow vs. Methodology

These terms are often conflated but describe different layers. I use the following layered model to reason about agent operating models:

Layer What it is Examples
Model The LLM itself – stateless reasoning engine Claude Opus/Sonnet, GPT-4.1/5, Gemini
Harness System prompt + tools + orchestration loop + context management – turns a model into an agent Copilot CLI, Claude Code, Copilot Chat (Agent mode), Cursor agent mode
Workflow A declarative plan defining what to do and in what order – executed by a harness Conductor YAML pipelines, Stripe blueprints (from Minions architecture)
Methodology Governance rules for how work should flow through phases – prescribes which workflows and what quality gates. RPI, SDD, FORGED
Platform The host environment that runs everything VS Code, terminal, cloud sandboxes

Harness orchestration is concerned with how an agent thinks and acts (e.g., loops, retries, tool routing). Workflow orchestration is concerned with what should happen in what order.

A methodology governs which workflows to use and when human judgment is required. Some systems span multiple layers – for example, Copilot CLI is a harness that can orchestrate other harnesses (via task agents) and can be driven by workflow engines like Conductor, which can run through workflows that follow/implement a methodology chosen by an engineering team.

When a methodology, workflow, and harness are combined into a cohesive product/system, I refer to the combination as an agent operating model. This is the complete picture of how an organization uses agents to get work done, including which harness provides the infrastructure, which workflows define the steps, and which methodology governs quality. For example, Stripe’s agent operating model = Goose-fork harness + blueprint workflows. I’ve not found that the industry has yet converged on a term for this combination – likely because no vendor sells all three layers together. Harness vendors sell infrastructure; workflow engines sell orchestration; but governance methodology is left to practitioners. The term “agent operating model” is thus coined here to name a gap the industry hasn’t yet filled. You are welcome to also use this term, but I have zero delusions of grandeur that anyone else other than myself ever will.

Orthogonal to these layers is the concept of packaging and distribution. Copilot Agent Plugins bundle harness customizations (tools, skills, agents, hooks, MCP servers) into installable packages that can be distributed via marketplace or Git URL. Plugins don’t add a new layer – they make the harness, workflow, and methodology layers portable and activatable across teams and repos.

Different organizations will arrive at different agent operating models across these layers. It is unlikely that there will be a best or recommended approach that is universally applicable. I encourage you to not only explore, but also to experiment and find what delivers the experience, cost, throughput, output quality, and end outcomes that your team(s) are most comfortable with.

You will find me speaking about and sharing what I’ve found works best, but please do know I care more about you achieving the best outcomes for yourself and your customers than simply repeating what I’ve shared.