The Navigator · 12 min mission

Plan Mode, TDD & Engineering Workflows

Ship multi-step changes with plans, tests, and tight feedback loops.

planningtddworkflowFact-checked 2026-06-13
On this page

The single biggest difference between a Claude Code session that ships and one that wastes an hour is sequencing. Let Claude read your prompt and jump straight to editing files, and it will confidently solve a problem — often the wrong one. The fix is not a smarter prompt; it is a better workflow. You separate thinking from doing, you give Claude a way to check its own work, and you keep each task's context clean.

This guide is the engineering-discipline layer of Claude Code. It covers the loop the Anthropic team actually recommends — explore, plan, code, commit — plus the modes and habits that make each phase reliable: plan mode, extended thinking, test-driven sessions, git worktrees for parallel work, and a code-review step with fresh eyes.

Why sequencing beats speed

Claude works inside a finite context window, and its performance degrades as that window fills. Every file it reads, every command it runs, and every dead-end it explores costs tokens you will want later. A workflow that front-loads understanding and defers irreparable actions — edits, commits, pushes — keeps the expensive context focused on the decision that matters. The discipline is not bureaucracy; it is how you stop paying for the same mistake twice.

Plan mode: think before you touch disk

Plan mode is a permission mode that tells Claude to research and propose changes without making them. Claude reads files and runs read-only shell commands to explore, then writes a plan — but it does not edit your source until you approve. Permission prompts still apply exactly as they do in default mode, so nothing slips through.

There are three ways into it. Inside a running session, press Shift+Tab to cycle permission modes: defaultacceptEditsplan. You can also start a session in plan mode with the --permission-mode plan flag, or prefix a single prompt with /plan to plan just that one turn. Press Shift+Tab again to leave plan mode without approving anything.

When the plan is ready, Claude presents it and asks how to proceed. You can approve it (and choose to continue in auto mode, accept-edits mode, or review each edit manually), keep planning with feedback, or — the underused move — press Ctrl+G to open the proposed plan in your text editor and rewrite it directly before Claude acts on it. Editing the plan is cheaper than correcting the code.

Extended thinking: spend more reasoning on hard problems

Plan mode controls what Claude is allowed to do; extended thinking controls how hard it reasons before answering. In current Claude Code it is on by default because it measurably improves planning and complex reasoning — thinking tokens are billed as output tokens, and the default budget can run to tens of thousands of tokens per request depending on the model.

The supported way to dial reasoning up or down is the effort level: set it with /effort or inside /model. On models with a fixed thinking budget you can instead cap it with the MAX_THINKING_TOKENS environment variable (for example MAX_THINKING_TOKENS=8000), and you can disable thinking entirely in /config for simple work. Note that Fable 5 always uses extended thinking — disabling it is not available there — and adaptive-reasoning models ignore a nonzero token budget, so use effort levels with those.

Claude Code also still recognizes the keyword ultrathink typed anywhere in a prompt — it adds an in-context instruction asking for deeper reasoning on that single turn, without changing your session effort setting. Plain phrases like think, think hard, or think more are not recognized as keywords; they are passed through as ordinary prompt text. Treat effort levels as the deliberate control: they are explicit, repeatable, and do not depend on remembering to sprinkle a magic word into every message.

The explore → plan → code → commit loop

  1. Explore

    Enter plan mode and let Claude read. Point it at the relevant code and ask it to understand the lay of the land — "read /src/auth and explain how we handle sessions and login; also look at how we manage secrets in environment variables." No edits happen, so this phase is always safe. For a sprawling codebase, delegate the reading to a subagent so the file dumps never enter your main context.

  2. Plan

    Still in plan mode, ask for a concrete plan: "I want to add Google OAuth. What files change? What is the session flow? Write a plan." Read it like a senior engineer reviewing a design doc. If a step is wrong, fix it now — press Ctrl+G to edit the plan directly rather than letting Claude build the wrong thing and correcting after.

  3. Code

    Switch out of plan mode and let Claude implement against its own plan: "implement the OAuth flow from your plan. Write tests for the callback handler, run the suite, and fix any failures." Pair the instruction with a verification target so Claude closes the loop itself instead of stopping at "looks done."

  4. Commit

    When the change is real and verified, ask Claude to "commit with a descriptive message and open a PR." With the gh CLI installed, it can create the pull request directly; the session links to the PR so you can return to it later with claude --from-pr <number>.

Test-driven sessions: write the test first

Agentic coding and test-driven development are a natural fit, because a failing test is the cleanest verification signal Claude can read. The official advice is blunt: give Claude a check it can run. Without one, "looks done" is the only signal available and you become the verification loop, catching every mistake by hand. A test suite closes that loop automatically — Claude writes code, runs the tests, reads the result, and iterates until they pass.

The disciplined version inverts the usual order. Instead of "build the feature, then add tests," you ask Claude to write a failing test that captures the desired behavior first, confirm it fails for the right reason, and only then implement until it goes green. For a bug, this is the highest-leverage move there is: "users report login fails after session timeout — check the token refresh in src/auth/, write a failing test that reproduces the issue, then fix it." The reproduction test proves you understood the bug before you touched the fix, and it stays in the suite as a guard against regressions.

A subtle trap: when Claude writes both the test and the implementation in one breath, it can quietly tune the test to match buggy code. A clean separation — one session (or one subagent) writes the tests, another writes the code to pass them — keeps the test honest, because the implementer never saw the freedom to weaken the assertions.

a test-first bug fix
… scroll to run this session
Test first, confirm the red, then fix to green. The failing test is the spec — and it stays in the suite as a regression guard.

Parallel work with git worktrees

Eventually one Claude is not enough — you want a feature building in one terminal while a bug gets fixed in another, without the two sets of edits colliding. The mechanism is the git worktree: a separate working directory with its own files and its own branch, sharing the same repository history and remote as your main checkout.

Claude Code makes this a one-liner. claude --worktree feature-auth (or the short -w) creates an isolated worktree — by default under .claude/worktrees/feature-auth/ on a new worktree-feature-auth branch — and starts a session inside it. Run the command again with a different name in a second terminal and you have two fully isolated agents. Omit the name and Claude invents one like bright-running-fox. Add .claude/worktrees/ to your .gitignore so the checkouts do not show up as untracked noise in your main repo.

Because a worktree is a fresh checkout, gitignored files like .env are not copied over. Add a .worktreeinclude file (it uses .gitignore syntax) listing the local-only files each new worktree should receive. On exit, Claude cleans up automatically when the worktree has no uncommitted changes, untracked files, or new commits; otherwise it asks whether to keep or remove it. You can always manage them with raw git — git worktree list and git worktree remove — and remember to install dependencies in each new worktree before Claude starts editing.

One checkout vs. parallel worktrees

Single working directory

Feature work and an urgent bug fix share one checkout. You stash, switch branches, lose your place, and Claude's context whiplashes between two unrelated problems.

Edits step on each other, and /clear between tasks is the only hygiene available.

A worktree per task

claude --worktree feature-auth in one terminal, claude --worktree bugfix-123 in another. Each has its own branch, its own files, and its own clean context.

The two agents never touch the same working tree, so neither can clobber the other's edits.

A code-review step with fresh eyes

The model that just wrote the code is the worst reviewer of it — it is biased toward the reasoning that produced the change. The cure is a fresh context. Anthropic recommends a Writer/Reviewer split: one session implements, a second session (or a subagent) reviews the diff knowing only the task and the criteria, not the chain of thought that got there.

The lowest-friction version is the bundled /code-review skill, which reviews the current diff for bugs in a fresh subagent and returns its findings straight to your session. When you want the review measured against intent rather than general correctness, write the prompt yourself and name three things: the work to check, the plan to check it against, and what counts as a finding — "use a subagent to review the rate-limiter diff against PLAN.md. Check every requirement is implemented, the listed edge cases have tests, and nothing outside scope changed. Report gaps, not style preferences."

One caveat worth internalizing: a reviewer told to find gaps will almost always report some, even when the work is sound — that is what you asked it to do. Chasing every finding leads to over-engineering: needless abstraction, defensive code, and tests for cases that cannot happen. Tell the reviewer to flag only gaps that affect correctness or the stated requirements, and treat the rest as optional.

Review commands, end to end

The /code-review skill above is one member of a family, and knowing which sibling to reach for saves you from running the wrong pass. They split along two axes: local vs. cloud, and bugs vs. cleanup.

/code-review is the local, fresh-eyes pass over your diff for correctness bugs and cleanups; append an effort level (low/medium/high/xhigh/max), add --fix to apply findings to your working tree, or --comment to post them as inline GitHub PR comments. (It and /simplify are introduced in Skills as bundled skills.) /simplify is the cleanup-only counterpart — four agents run in parallel over reuse, simplification, efficiency, and whether the change sits at the right altitude — and it deliberately does not hunt for correctness bugs, so pair it with /code-review rather than substituting one for the other.

For a deeper pass, /code-review ultra runs a multi-agent review in the cloud: a fleet of reviewers explore the change in parallel and each finding is independently reproduced before it is reported, so the signal is higher and it favors real bugs over style. It runs in the background, leaving your terminal free. /ultrareview is an alias of /code-review ultra — and note the direction, since it is commonly stated backwards: /code-review ultra is the preferred current form, not a deprecated one. /review is the lightweight option — a single-pass, read-only PR review run locally, for quick feedback when you don't need the cloud depth. /security-review narrows the lens to vulnerabilities (injection, auth, data exposure) across the branch diff. /autofix-pr is fire-and-forget: it spawns a Claude Code on the web session that watches your PR and pushes fixes when CI fails or reviewers comment. And /diff just opens the interactive viewer so you can see what changed — uncommitted plus per-turn — before any of these run.

CommandWhere it runsWhen to reach for it
/code-review [effort] [--fix] [--comment]Local *(skill)*A fresh-eyes bug and cleanup pass on your diff before shipping
/simplifyLocal *(skill)*Cleanup-only pass that applies fixes — not for finding bugs
/code-review ultra *(alias /ultrareview)*Cloud, multi-agentPre-merge confidence: each finding reproduced; runs in the background
/review [PR]Local, single-passQuick read-only PR feedback while iterating
/security-reviewLocalA vulnerability-focused pass: injection, auth, data exposure
/autofix-pr [prompt]Cloud *(web session)*Keep an open PR green automatically after you move on
/diffLocal viewerSee uncommitted and per-turn changes before reviewing
The review-and-ship command family. /code-review and /simplify are taught in the Skills guide; the rest complete the set. Verified against the official commands, code-review, and ultrareview docs.

The large-codebase playbook

A big, unfamiliar repository is where these habits pay off most, because the failure mode is specific: Claude reads hundreds of files "investigating," fills its context, and starts forgetting your actual instructions. The playbook is built around protecting that context.

Start by treating Claude like a senior engineer on the team — ask the questions you would ask a human: "give me an overview of this codebase," "what are the key data models?," "how is authentication handled?," "trace the login flow from front-end to database." Begin broad, then narrow. Use @ to pull a specific file or directory into context precisely instead of letting Claude hunt for it, and lean on a code-intelligence plugin for your language so "go to definition" replaces a grep-and-read fishing trip.

The keystone move is delegating exploration to subagents. Reading a large subsystem in your main session dumps every file into your context; a subagent reads those files in its own window and reports back only the summary — "use a subagent to investigate how our auth system handles token refresh, and whether we already have OAuth utilities I should reuse." Pair that with a short, router-shaped CLAUDE.md and frequent /clear between unrelated tasks, and you keep the main conversation lean enough that Claude actually follows the instructions that matter.

Ultraplan: hand the planning off to the cloud

The explore-plan loop above runs entirely in your terminal, which means a heavy planning pass ties up your session while Claude reads and drafts. Ultraplan breaks that coupling: it hands the planning task from your local CLI to a Claude Code on the web session running in plan mode, drafts the plan in the cloud, and leaves your terminal free to keep working. [V] It is in research preview and needs Claude Code v2.1.91+, a Claude Code on the web account, and a GitHub repo — and because it runs on Anthropic's cloud, it is not available on Amazon Bedrock, Google Vertex AI, or Microsoft Foundry. [V]

There are three ways in. [V] Run /ultraplan followed by your prompt (/ultraplan migrate the auth service from sessions to JWTs), drop the word ultraplan anywhere in a normal prompt, or — when a local plan finishes and shows its approval dialog — choose No, refine with Ultraplan on Claude Code on the web to push the draft to the cloud for a richer review surface. The command and keyword paths ask for confirmation first; the local-plan path skips it because the menu selection already counts as confirmation.

While the cloud session works, your CLI prompt shows a status indicator — and the shape of the diamond is the signal. [V] A hollow ◇ ultraplan means Claude is still researching and drafting; ◇ ultraplan needs your input means it has a clarifying question (open the session link to answer); and a filled ◆ ultraplan ready means the plan is done and waiting in your browser. Run /tasks and select the ultraplan entry for a detail view with the session link, live agent activity, and a Stop ultraplan action that archives the cloud session and clears the indicator.

Review in the browser, then teleport back

Open the session link on ◆ ultraplan ready and the plan loads in a dedicated review view that the terminal can't match: highlight any passage for an inline comment, drop an emoji reaction to signal approval or concern without writing prose, and jump between sections from an outline sidebar. [V] Ask Claude to address your comments and it revises the draft in place; iterate as many rounds as you need before deciding where the work runs.

When the plan is right, you choose — from the browser — whether Claude implements it in the cloud or sends it home. [V] Approve Claude's plan and start coding keeps execution in the same web session; when it finishes you review the diff and open a PR from the web. Approve plan and teleport back to terminal is the move that closes the loop with your local environment: the web session is archived so it can't keep working in parallel, and your waiting terminal pops an Ultraplan approved dialog with three options. [V]

OptionWhat it doesReach for it when
Implement hereInjects the plan into your current conversation and continues from where you left off.Your session context is still relevant and you want to keep going.
Start new sessionClears the conversation and begins fresh with only the plan as context. Prints a claude --resume command so you can return to the old thread.The old context is noisy and the plan should run on a clean window.
CancelSaves the plan to a file without executing and prints the path.You want the plan on disk to run later — or to hand to a different tool.
The "teleport back" dialog: after "Approve plan and teleport back to terminal," your terminal shows the Ultraplan approved dialog with three options. The right choice depends on how clean your current context is.

TaskCreate/TaskUpdate: the structured task list replaces TodoWrite

If you watch Claude track a multi-step job and notice the progress list updating one item at a time instead of being rewritten wholesale, that is the Task tools at work. As of Claude Code v2.1.142 (and TypeScript Agent SDK 0.3.142), sessions use the structured TaskCreate, TaskUpdate, TaskGet, and TaskList tools by default, replacing the single TodoWrite call that previously rewrote the whole todo array on every change. [V]

The shift is more than cosmetic. [V] Where TodoWrite took one call that rewrote the full todos array, TaskCreate adds one item and TaskUpdate patches one item by taskId (status is pending, in_progress, or completed; status: "deleted" removes it), with TaskList and TaskGet letting the model read the current list back. The per-item, ID-keyed model is what lets a task list carry dependencies and survive across the parallel background sessions that agent view and ultraplan rely on — a single rewrite-the-array call could not.

For most work you never touch this; the tools are the default and need no configuration. The one knob worth knowing: if a script, monitor, or harness still watches for TodoWrite tool-use blocks, set the environment variable CLAUDE_CODE_ENABLE_TASKS=0 to make that session emit TodoWrite again instead of the Task tools. [V] Treat it as a temporary bridge for un-migrated tooling, not a setting to leave on.

Knowledge check

You are about to add OAuth to an auth module you have never touched, in a large unfamiliar codebase. What is the strongest opening move?

Reach the end and this star joins your charted sky.