The Navigator · 11 min mission
Test-Driven Loops & Systematic Debugging
Give Claude a check it can run, then let the loop close itself to green.
On this page
How to give Claude Code a pass/fail check it can read, so it verifies its own work instead of stopping at "looks done." Covers the four verification gates, the red→green TDD cycle, reproduce-then-fix debugging, and independent review. After this you can write a test-first task, set an automatic gate that iterates to green, reproduce a bug as a failing test, and review a diff with a model that did not write it.
| Gate | How it works | Best for |
|---|---|---|
| In one prompt | Ask Claude to run the check and iterate in the same message: *"…run the test suite and fix any failures."* | A single focused task you are watching |
/goal condition | A small fast model (Haiku by default) re-checks the condition after every turn; Claude keeps working until it holds, then the goal clears itself. | A multi-turn push to one measurable end state |
Stop hook | A script runs the check and blocks the turn from ending until it passes (exit 2). Deterministic, not advice. | A guarantee that runs in every session |
| Second opinion | A subagent or /code-review refutes the result in a fresh context, so the writer is not the grader. | Catching overfitting and missed edge cases |
The red → green → verify cycle
Write the test — state it is test-first
Tell Claude what behavior to verify and that the implementation does not exist yet, with concrete cases. The docs prefer verification criteria over vague prompts: "write a
validateEmailfunction. example test cases:user@example.comis true,invalidis false,user@.comis false. run the tests after implementing." Claude examines existing test files to match the style, frameworks, and assertion patterns already in use.Confirm it fails — red for the right reason
Run the suite in shell mode (
! npm test) or have Claude run it. You want a clean assertion failure, not a syntax or import error — that is what makes the test a real verification signal.Commit the red test
Commit the failing test before implementing. This pins a version-controlled definition of "done" that the implementation step cannot quietly edit out from under you.
Implement to green — without editing the test
Switch out of plan mode and state the constraint: implement the behavior, run the tests after each change, and do not modify the tests to make them pass.
Verify with fresh eyes
Before treating the task as done, have a subagent (or
/code-review) review the diff in a fresh context — it sees only the diff and your criteria, not the reasoning that produced the change, so it grades the result on its own terms.
Hands-off gates: /goal and Stop hooks
When a task needs several turns — get a suite green, clean the lint, make the build pass — /goal (Claude Code v2.1.139+) sets a completion condition and Claude keeps working toward it without per-turn prompting. After each turn, a small fast model (Haiku by default) checks whether the condition holds; if not, Claude starts another turn. Setting a goal starts a turn immediately, a ◎ /goal active indicator shows, and the goal clears automatically once the condition is met.
The evaluator does not run commands or read files on its own — it only judges what Claude has already surfaced in the transcript. So the condition must be demonstrable from Claude's own output: Claude has to actually run npm test and let the result land in the conversation. Write one measurable end state, a stated check (e.g. npm test exits 0), the constraints that must not change, and a runtime bound (or stop after 20 turns). Max condition length is 4,000 characters.
# set a well-formed condition (one end state + a stated check + a bound)
/goal all tests in test/auth pass and "npm test" exits 0, the lint step is clean,
and the public session API is unchanged — or stop after 20 turns
# inspect: condition, duration, turns evaluated, token spend, evaluator's last reason
/goal
# clear it (aliases: stop, off, reset, none, cancel; /clear also clears it)
/goal clear
# non-interactive: run the loop to completion (needs workspace trust accepted)
claude -p "/goal CHANGELOG.md has an entry for every PR merged this week"When a check must run every time — no exceptions, no relying on Claude to remember — use a Stop hook directly. A hook is a user-defined command that fires when Claude finishes responding and can block the turn from ending to keep it working. Exit 0 lets the turn end; exit 2 blocks it and feeds the stderr message back to Claude as the reason to keep going. Define hooks in ~/.claude/settings.json (all projects), .claude/settings.json (committable per-project), or .claude/settings.local.json (gitignored); browse them read-only with /hooks; disable all with "disableAllHooks": true.
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "npm test --silent || { echo 'Tests are failing — fix them before stopping.' >&2; exit 2; }"
}
]
}
]
}
}| Event | Fires | Can block? |
|---|---|---|
Stop | When Claude finishes responding — not only at task completion; not on user interrupts. | Yes — exit 2 or {"decision":"block","reason":"…"} keeps Claude working. |
SubagentStop | When a subagent finishes. | Yes — same blocking capability as Stop. |
PreToolUse | Before a tool call executes. | Yes — hookSpecificOutput.permissionDecision: allow / deny / ask / defer. |
PostToolUse | After a tool call succeeds (it already ran). | No — but can {"decision":"block"} the conversation or rewrite output via updatedToolOutput. |
Debugging: reproduce as a failing test, then fix
Debugging is the same loop pointed at a bug. The common-workflows "fix bugs efficiently" recipe: share the error ("I'm seeing an error when I run npm test"), give Claude the command to reproduce it and the stack trace, state the repro steps, and say whether the failure is intermittent or consistent. Make the bug reproducible first so "fixed" has a re-runnable meaning. Two phrasing rules, both from the best-practices prompt tables: describe the symptom, not the fix, and demand the root cause so a patch does not just silence the symptom.
Two ways to report the same bug
Symptom-then-guess
"Login is broken after a while. Add a null check in refresh()."
You hand Claude your hypothesis as the task. If the real cause is elsewhere, it patches the wrong place and the bug survives behind a green-looking change.
Reproduce-then-fix (docs prompt)
"users report that login fails after session timeout. check the auth flow in src/auth/, especially token refresh. write a failing test that reproduces the issue, then fix it. address the root cause, don't suppress the error. Keep the session API unchanged."
Supplies outcome, location, repro, verification, and constraint — Claude locates the real defect instead of patching a guess.
Work a real failure to its root cause
Symptom → fix
Pick what you're seeing. A question or two later you land on the exact command to run — every step pulled straight from the Claude Code troubleshooting docs.
Where does it hurt?
Independent verification
The model that just wrote the code is biased toward it: per the docs, "a fresh context improves code review since Claude won't be biased toward code it just wrote." Two ways to review in a fresh context:
/code-review— a bundled skill that reviews the current diff for bugs in a fresh subagent and returns findings to the session.- A plan-conformance subagent — "review the diff against PLAN.md. Check that every requirement is implemented, the listed edge cases have tests, and nothing outside the task's scope changed. Report gaps, not style preferences."
A reviewer told to find gaps will usually report some even when the work is sound, and chasing every finding leads to over-engineering — extra abstraction layers, defensive code, tests for cases that cannot happen. Tell the reviewer to flag only gaps that affect correctness or the stated requirements and treat the rest as optional.
Compose an unattended TDD-to-green run
Plan first
Enter plan mode (
Shift+Tabto cycledefault → acceptEdits → plan, prefix one prompt with/plan, or start withclaude --permission-mode plan). Claude researches and proposes without editing source. PressCtrl+Gto open the plan in your editor; refine, then approve.Approve into a hands-off mode
Approving exits plan mode and switches the session to the mode the approve option describes. For an unattended run choose Approve and start in auto mode (auto mode requires v2.1.83+) or Approve and accept edits (
acceptEdits).Write and confirm the failing tests
Use
! npm testto land a real red result in context — a clean assertion failure, for the right reason.Set a gate to drive to green
Set a
/goaltied tonpm testexiting 0 (and a turn bound), or install aStophook so Claude iterates without per-turn prompting.Close with a fresh-eyes pass
Run
/code-reviewor a plan-conformance subagent. Each stage leaves evidence — red test, green run, clean review — so you read proof, not Claude's word.
Knowledge check
You ask Claude to "make all auth tests pass" and set a /goal condition: "all tests in test/auth pass." It reports success after a few turns, but when you run the suite yourself, two tests still fail. What most likely went wrong?
Reach the end and this star joins your charted sky.