First Principles · 12 min mission

Agentic Design Patterns: The Shapes Every Coding Agent Reuses

Learn the tool-agnostic patterns — the agent loop, chaining, routing, fan-out, orchestrator–workers, reflection — and exactly when each one wins.

agentic-patternsarchitecturefirst-principlesagent-looporchestrationFact-checked 2026-06-15
On this page

Agentic design patterns are named control structures for arranging LLM calls and tools. This guide gives you the decision rule for picking one, the exact shape of each pattern, and the cost each adds — so you can match a task to the minimum structure that solves it.

CategoryDefinitionControl lives inUse when
WorkflowLLMs and tools orchestrated through predefined code pathsYour codeYou can pre-map the decision tree; want accuracy, control, lower cost
AgentLLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasksThe modelOpen-ended task where you can't predict the number of steps
Anthropic splits all agentic systems into two categories (verbatim definitions). The split decides every downstream tradeoff.

Explore the patterns

Agentic pattern explorer

Six composable shapes for wiring LLMs and tools — from a single self-directing loop to fixed workflows. Pick one and watch control and data flow through it: edges light up in order, parallel branches glow together, and dashed lines are feedback loops. Each comes with a one-line use when.

flow · LLM ⇄ tools, until done
Agent
TaskAgentreason + actToolsenvironmentDone
step 0/6

Single agent loop

Agent

One model runs tools in a loop, reading the result of each action back from the environment before choosing the next. It keeps going — gathering ground truth as it goes — until it decides the task is complete.

Dynamic controlTool feedback loopSelf-terminating
control / data flowfeedback loopoutputPatterns from Anthropic’s “Building effective agents”.

Single agent loop: tracing flow, step 0 of 6.

Select a pattern to see how control and data flow through it — and the one-line "use when".

The agent loop: gather → act → verify → repeat

For open-ended tasks, every agent runs the same four-beat loop (Anthropic, verbatim): gather context → take action → verify work → repeat.

  • Gather context — read files, run agentic search (grep / find / tail to pull relevant slices instead of whole files), or delegate to subagents with isolated context windows.
  • Take action — execute via tools: bash, code generation, file edits, MCP servers.
  • Verify work — check the result before declaring done, using ground truth from the environment (tool results, test output).
  • Repeat — a failed verification loops back to "take action."

Without ground-truth feedback at each step, the model guesses and compounds errors. Verification is the beat that makes this an agent rather than a script.

The loop with the Claude Agent SDK (Python ≥ 3.10)
python
# pip install claude-agent-sdk
# Ships the gather -> act -> verify -> repeat loop that powers Claude Code, with
# built-in tools (Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch),
# Subagents (via the Agent tool, isolated context), and MCP support.
# TS equivalent: npm i @anthropic-ai/claude-agent-sdk
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions
 
async def main() -> None:
    options = ClaudeAgentOptions(
        # Make "verify" deterministic: a rule that either passes or fails.
        allowed_tools=["Read", "Edit", "Bash", "Grep"],
        system_prompt=(
            "Fix the failing test in tests/. After every edit, run "
            "'pytest -q' and only stop when it passes. Do not edit or delete "
            "tests to make them pass."
        ),
    )
    async for message in query(
        prompt="The auth test is red after the password-reset change. Make it green.",
        options=options,
    ):
        print(message)  # gather -> act -> (pytest = verify) -> repeat until green
 
anyio.run(main)
MethodHow it verifiesWhen to use itCost / caveat
Rules-based *(linters, types, tests)*A defined rule passes or fails; the agent is told which rule failed and whyAnything expressible as a deterministic check — "the best form of feedback"Cheap and fast; needs the rule to exist
Visual feedbackScreenshots / renders the model inspectsLayout, styling, responsiveness — things a test cannot assertNeeds a render step and a vision-capable model
LLM-as-judgeA separate model scores against fuzzy criteriaOnly when no rule or render can capture the criterion"Heavy latency tradeoffs" for marginal gains — last resort
Three verification methods Anthropic names, best first. Reach down the list only when the level above cannot express your criterion.
PatternShapeWhen it wins (verbatim)Example
Prompt chainingSequence of steps; each LLM call processes the previous output; optional programmatic gates between stepsTask can be "easily and cleanly decomposed into fixed subtasks"Outline → gate-check outline meets brief → write doc; copy → translate
RoutingA classifier (LLM or classical) sorts input, then sends it to a specialized handler"Distinct categories that are better handled separately, and where classification can be handled accurately"Support desk: general / refund / tech → different flows; easy→Haiku, hard→Sonnet
Parallelization — sectioning"Breaking a task into independent subtasks run in parallel"Subtasks parallelizable for speedOne model answers while another screens for inappropriate content
Parallelization — voting"Running the same task multiple times to get diverse outputs"Multiple attempts/perspectives needed for higher-confidence resultsSeveral prompts review code for vulns; vote with a threshold
The four workflow patterns: shape, the exact "when it wins" wording, and a concrete example. All four are model-agnostic and implementable in a few lines.

Parallelization vs orchestrator–workers

Both fan work across multiple LLM calls. The distinction is who draws the subtasks:

  • Parallelization runs pre-defined subtasks — you decided the branches in code before the model ran.
  • Orchestrator–workers is model-driven: "a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results," and "the subtasks aren't pre-defined, but determined by the orchestrator based on the specific input."

Use orchestrator–workers for "complex tasks where you can't predict the subtasks needed" — Anthropic's example is "coding products that make complex changes to multiple files each time." If subtasks are fixed, hardcode and parallelize; if they vary per input, let the orchestrator decide.

Orchestrated fan-out, live

Orchestrated fan-out

One orchestrator drives every worker; nothing flows worker-to-worker. Pick a substrate per lane, give it a prompt, and hit Run. Each lane fills in parallel at its own pace with a live token and cost meter — then a synthesis node merges the results and surfaces only the cross-module conflicts.

orchestrator

main thread · one-directional fan-out

runwaitpeekget_result
0 tok$0.000

idle · cap $0.500

0 tok$0.000

idle · cap $0.500

0 tok$0.000

idle · cap $0.500

synthesis

Waits for every worker, then merges in the main thread.

The orchestration call

# ai-cli-mcp: fan out, then wait + get_result to synthesize
run(
prompt="Migrate auth/ to the v2 token API",
model="sonnet",
workFolder="./workers/claude-1",
) --max-budget-usd 0.50
run(
prompt="Migrate billing/ to the v2 token API",
model="gpt-5.4",
workFolder="./workers/codex-2",
) --max-budget-usd 0.50
run(
prompt="Migrate webhooks/ to the v2 token API",
model="codex-cloud",
workFolder="./workers/cloud-3",
) --max-budget-usd 0.50
wait() # block until all 3 workers land
get_result() # collect summaries; synthesize in the main thread

Idle. Orchestrator ready to fan out across 3 worker lanes.

Watch an orchestrator fan a job across workers and aggregate the results — and watch the token meter, because parallel agents multiply cost.

Evaluator–optimizer (reflection)

"One LLM call generates a response while another provides evaluation and feedback in a loop." The broader literature calls this reflection (Andrew Ng's taxonomy) — the same shape under a different name.

It is "particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value." Two signals it fits: a human articulating feedback demonstrably improves the output, and the LLM can produce that critique itself. Anthropic's examples: literary translation; multi-round search where an evaluator decides whether more searching is warranted. With fuzzy criteria you get an expensive loop that polishes nothing — prefer deterministic verification first.

Plan-and-execute vs ReAct: when the model thinks

Plan-and-execute (LangChain)

Planner generates a full multi-step plan up front; executor(s) carry out each step (often smaller, cheaper models); a replanning step decides whether to finish or generate a follow-up plan.

Three stated wins: speed (intermediate steps skip the big model), cost (large model "only called for (re-)planning steps"), quality (planner must "explicitly think through all the steps").

Footgun: no replanning = rigid — a wrong initial plan executes faithfully to a wrong answer.

ReAct (Yao et al.)

"The LLM only plans for 1 sub-problem at a time" — think → act → observe, one tool call per turn, adapting continuously.

Wins on simple, dynamic tasks solvable in a few tool calls where each next step depends on the last observation.

Anthropic folds planning into the agent category and states a core principle verbatim: "Prioritize transparency by explicitly showing the agent's planning steps."

Long-running agents: the plan → execute → review structure

Anthropic's long-running-agents harness operationalizes plan/execute/review as a planner / generator / evaluator structure with durable artifacts that survive a context reset:

  • An initializer agent runs once: writes an init.sh script, a claude-progress.txt progress file, and an initial git commit.
  • A coding agent makes incremental progress session-by-session against a feature list (JSON) of 200+ granular, testable features marked passing/failing.
  • The agent verifies as an engineer would: run the dev server via init.sh, do real end-to-end testing (e.g. a Puppeteer MCP server), and mark a feature done only when it actually works.
If the task…UseBecause
is solved by one augmented LLM callNo patternSimplest solution first; patterns add latency and cost
splits into fixed, clean sequential stepsPrompt chainingEach easier subtask raises accuracy; gates catch drift
has distinct input categories handled best separatelyRoutingSpecialized prompts per class; cheap model for easy inputs
splits into fixed independent subtasks, or needs many attemptsParallelization *(section / vote)*Run them at once for speed, or vote for confidence
has subtasks you cannot predict until you see the inputOrchestrator–workersThe model decides the subtasks at runtime
has a clear pass/fail check and improves with iterationEvaluator–optimizerA critique loop measurably refines the output
is open-ended with no predictable number of stepsAgent *(loop / plan-execute)*You can't hardcode the path; the model needs ground-truth feedback
Decision table: match the failure mode to the minimum control mechanism. Read top to bottom; stop at the first row that fits.

Knowledge check

You build a coding feature that changes an unpredictable number of files — sometimes two, sometimes a dozen, depending on the request. Which pattern fits, and why?

Reach the end and this star joins your charted sky.