What is harness engineering for AI coding agents?

Harness engineering is the practice of wrapping AI coding agents with guardrails, structured workflows, and persistent memory so they operate safely and reliably. A harness controls what files the agent can read, what commands it can run, what rules it must obey, and what it remembers across sessions.

What AI coding agents does goat-flow support?

goat-flow supports Claude Code, Codex, Gemini CLI, and Copilot CLI. It provides agent-specific instruction files and scores each agent's harness completeness across five concerns: context, constraints, verification, recovery, and feedback loop.

How do I install goat-flow?

Run npx @blundergoat/goat-flow@latest dashboard to set up goat-flow on any project. Then run npx @blundergoat/goat-flow@latest audit --harness to verify your harness configuration. No global install required.

How does goat-flow prevent AI agents from making dangerous changes?

goat-flow uses pre-action hooks that block dangerous commands before they execute. The default deny-dangerous hook catches destructive filesystem commands like rm -rf, all git push, secret file reads, subshell escapes, and database truncation. These are hardcoded boundaries the model cannot bypass through prompt manipulation.

What is the goat-flow learning loop?

The learning loop captures every mistake as persistent context so the next session avoids it. It includes footguns (architectural traps with evidence), lessons (behavioural mistakes to avoid), decisions (Architecture Decision Records), and session logs for continuity across agents and sessions.

AI harness engineering

Guardrails and memory
for your AI coding agent.

Name: goat-flow
Author: BlunderGOAT

Your AI coding agent will skip the test, leak the secret, and forget yesterday. That's not a prompting issue - it's a harness problem. goat-flow is the opinionated harness for teams shipping with Claude Code, Codex, Gemini CLI, and Copilot CLI - not just demoing them.

$ npx @blundergoat/goat-flow@latest dashboard

Get started GitHub

~/projects/myapp - audit

~/projects/myapp (main)

$ npx goat-flow audit --harness

✓ Build checks 17/17 passing

Agent harness

✓ Claude Code A 94%

✓ Codex A 91%

✓ Gemini CLI B 87%

✓ Copilot CLI B 85%

Five concerns

Context

87%

Constraints

100%

Verification

85%

Recovery

82%

Feedback loop

89%

16 harness checks · all critical paths passing

Why harness engineering?

Agents need better control systems.

Files it can read. Commands it can run. Rules it must obey. Memory it keeps across sessions. That's the harness - and it matters more than which model you pick. goat-flow gives you one, opinionated, out of the box.

Supports Claude Code Codex Gemini CLI Copilot CLI

The system

Four pieces. One harness.

Audit tells you what's missing. Skills give the agent workflows. Hooks stop dangerous actions. The learning loop remembers what happened.

01 / Audit

Pass/fail checks, no wiggle room

Validates every file, skill, and hook the agent needs. Either it's installed or it isn't. Scores each agent's harness completeness across the five concerns.

goat-flow audit --harness

02 / Skills

Structured slash commands

Seven workflows with defined phases, named artefacts, and stopping points. Debug, plan, review, critique, security, QA - plus a dispatcher that routes your intent to the right skill.

/goat, /goat-debug, /goat-plan...

03 / Hooks

Safety nets that can't be skipped

Pre-action guards block dangerous commands before they run. Post-action guards catch silent breakage after. deny-dangerous ships by default, blocking destructive filesystem commands, all git push, secret exfiltration, and risky subshells.

.goat-flow/hooks/

04 / Learning loop

Persistent memory across sessions

Footguns, lessons, decisions, session logs. Every mistake becomes next session's context. The compounding bet: every session that hits a problem makes the next one harder to trip.

.goat-flow/lessons, /footguns, /decisions

Under the hood

The execution loop

Every agent action follows four steps. Each one prevents a specific failure mode that free-running agents reliably hit.

READ

Load the files first

Pull in the actual code before reasoning about it.

Prevents fabrication - inventing APIs that don't exist.

SCOPE

Declare what changes

List files that will be touched, and files that won't.

Prevents surprise blast radius - changing files nobody agreed to.

ACT

Make the change

Edit only within the declared scope. Nothing else.

Prevents drift - refactoring that seemed related while the agent was in there.

VERIFY

Prove it works

Run linters, re-read changed files, confirm nothing drifted.

Prevents silent breakage - passing the task but breaking the build.

Seven skills

Workflows, not suggestions.

Free-form prompting is how agents get lost. Skills are structured slash commands with defined phases and clear stopping points. Use /goat as the default entry point and it routes to the right one.

/goat Dispatcher that classifies your intent and routes to the right skill Default

/goat-debug Diagnose bugs without accidentally rewriting the codebase end to end Debug

/goat-plan Plan features, refactors, and milestones - scales from hotfix to system change Plan

/goat-review Review diffs and verify what shouldn't be there, not just what should Review

/goat-critique Surface blind spots from multiple angles before shipping Critique

/goat-security Threat model, dependency audit, and compliance checks Security

/goat-qa Generate test plans with automated, AI-verified, and manual steps QA

Hooks

Block dangerous actions before they run.

A system prompt is a suggestion. A hardcoded boundary is a rule. Hooks enforce boundaries at a layer the model cannot talk its way past.

Ships with sensible defaults

deny-dangerous catches the patterns agents hit most often when they go off-script: destructive filesystem commands, all git push, secret file reads, subshell escapes, and database truncation.

Extend with your own

Drop linters, format-on-save, custom validators, or project-specific rules into the hooks directory. They register automatically and run in parallel with the defaults.

deny-dangerous Pre-action

✗rm -rfdestructive

✗git pushall pushes blocked

✗cat .envsecret read

✗curl | shexfiltration

✗eval, bash -csubshell escape

✗DROP TABLEdata loss

✗> filetruncation

✗$(...)recursive sub

Learning loop

The harness gets smarter every session.

Two things failed. Nothing remembered, and nothing stopped them. The learning loop fixes both.

Footguns

Architectural traps captured with semantic-anchor evidence. Stops the agent from hitting the same code landmine twice.

.goat-flow/footguns/

Lessons

Behavioural mistakes the agent made - logged so the same error pattern is recognised and avoided next time.

.goat-flow/lessons/

Decisions

Architecture Decision Records. Captures why a choice was made so future agents don't quietly reverse it.

.goat-flow/decisions/

Session logs

End-of-session summaries provide continuity between work sessions - across agents, across days, across context compactions.

.goat-flow/logs/sessions/

The framework

The five concerns of AI harness engineering.

The common ground across the public harness engineering literature. goat-flow scores every installed harness against these five.

Context Give the agent a map, not a 1,000-page manual. Concise instructions, the right files, progress notes across sessions.

Constraints Deterministic rules that steer before the agent acts. Linters, deny-hooks, permissions, required sections.

Verification Structural checks the agent runs to prove its own work. Tests, typecheck, post-action hooks, back-pressure.

Recovery Session durability and restart paths. Checkpoint and resume, compaction handlers, milestone checkboxes, loop detection.

Feedback loop Capture every mistake as persistent context so the next session doesn't repeat it. Footguns, lessons, decisions, logs.

Sources: Mitchell Hashimoto, Birgitta Böckeler (martinfowler.com), Anthropic engineering, and HumanLayer. goat-flow synthesises these into a working system with strong defaults.

Get started

From zero to passing audit in two commands.

Set up on any project, verify the harness, then start running skills through your agent of choice.

1 npx @blundergoat/goat-flow@latest dashboard

2 npx @blundergoat/goat-flow@latest audit --harness

Supports Claude Code, Codex, Gemini CLI, and Copilot CLI. Read the CLI docs →

Guardrails and memoryfor your AI coding agent.