AI coding agents can skip verification, "accidentally" run harmful commands, and repeat the same mistakes at the worst time. That's not a prompting issue, it's a harness problem. goat-flow is an opinionated harness for Claude Code, Codex, Gemini CLI, and Copilot CLI.
Terminal output showing goat-flow audit results: 12 of 12 setup checks passing, harness scores for Claude Code (94%), Codex (91%), Gemini CLI (87%), and Copilot CLI (85%), plus five-concern coverage for context, constraints, verification, recovery, and feedback loop.
Every serious practitioner has converged on the same insight: the LLM is commodity, the scaffolding around it isn't. Files it can read, commands it can run, rules it must obey, memory it keeps across sessions. That's the harness. goat-flow gives you one, opinionated, out of the box.
Audit tells you what's missing. Skills give the agent workflows. Hooks stop dangerous actions. The learning loop remembers what happened.
Validates every file, skill, and hook the agent needs. Either it's installed or it isn't. Three scopes: goat-flow setup, per-agent configuration, and harness completeness across the five concerns.
Seven workflows with defined phases, named artifacts, and stopping points. Debug, plan, review, critique, security, QA - plus a dispatcher that routes your intent to the right skill.
Pre- and post-action guards fire before the agent can hurt anything. deny-dangerous ships by default, blocking rm -rf, force-push, secret exfiltration, and six other patterns.
Four kinds of records turn every mistake into next session's context. Footguns, lessons, decisions, session logs. The compounding bet: every session that hits a problem makes the next one harder to trip.
Every agent action follows four steps. Each one prevents a specific failure mode that free-running agents reliably hit.
Pull in the actual code before reasoning about it.
List files that will be touched, and files that won't.
Edit only within the declared scope. Nothing else.
Run linters, re-read changed files, confirm nothing drifted.
Free-form prompting is how agents get lost. Skills are structured slash commands with defined phases and clear stopping points. Use /goat as the default entry point and it routes to the right one.
deny-dangerous catches the patterns agents hit most often when they go off-script: destructive filesystem commands, force-pushes, secret file reads, subshell escapes, and database truncation.
Drop linters, format-on-save, custom validators, or project-specific rules into the hooks directory. They register automatically and run in parallel with the defaults.
Agents forget everything between runs. Four kinds of persistent records make sure the same mistake doesn't happen twice.
Architectural traps captured with file:line evidence. Stops the agent from hitting the same code landmine twice.
Behavioural mistakes the agent made - logged so the same error pattern is recognised and avoided next time.
Architecture Decision Records. Captures why a choice was made so future agents don't quietly reverse it.
End-of-session summaries provide continuity between work sessions - across agents, across days, across context compactions.
The common ground across the public harness engineering literature. goat-flow scores every installed harness against these five.
Sources: Mitchell Hashimoto, Birgitta BΓΆckeler (martinfowler.com), Anthropic engineering, and HumanLayer. goat-flow synthesises these into a working system with strong defaults, rather than a framework you have to assemble yourself.
Install globally, set it up on any project, and start running skills through your agent of choice.
npm install -g goat-flow
goat-flow setup . --agent claude
goat-flow audit --harness
Supports Claude Code, Codex, Gemini CLI, and Copilot CLI. Read the CLI docs β