The Bridge · 11 min mission

Orchestrated Fan-Out: Many Agents, One Conductor

Drive Claude and Codex agents in parallel from one orchestrator and consolidate the results.

tandemorchestrationscaleFact-checked 2026-06-13
On this page

A migration lands on your desk: rename a core API across 24 modules, each with its own tests, its own edge cases, its own owner who will be annoyed if you break it. Done sequentially in one session, that is an afternoon of context bloat — by module 8 the window is a swamp of half-remembered diffs and the agent starts forgetting what it changed in module 2. Done as 24 independent jobs, it is twenty minutes of wall-clock time and one clean summary.

That is the fan-out problem, and this guide is about the cleanest way to solve it across two model families at once: a single orchestrator (your Claude Code session, or a CI runner) that launches many workers — Claude Code and Codex agents alike — lets them run in parallel, waits, collects their results, and synthesizes one answer. The workers do the grunt work in their own context windows. The orchestrator never reads their intermediate logs; it reads only the final, consolidated output. Your context stays clean while 24 agents churn.

The fan-out shape, and what it is not

Fan-out is a star, not a mesh. One conductor at the center; N workers on the spokes. The conductor launches each worker and aggregates what comes back. Crucially, communication is one-directional and the workers are leaves: worker 7 does not talk to worker 12, does not know worker 12 exists, and cannot hand it a subtask. There is no negotiation, no shared blackboard, no gossip. Each worker gets a prompt, runs to completion in isolation, and returns a result up the spoke. The conductor is the only thing that sees the whole picture.

This is deliberately not peer-to-peer delegation, where two agents converse and pass work back and forth as equals — that is the shape of the cross-tool plugin guides, where Claude Code asks Codex to review and Codex answers into the same thread. And it is not Claude's internal agent teams, where multiple sessions share a task list and message each other directly. Fan-out is simpler and, for embarrassingly-parallel work, stronger: because workers are isolated leaves, you can launch 24 of them without 24 chances for two agents to clobber the same file or talk each other into a bad plan. The honest box near the end draws all three apart precisely.

Fan-out vs. peer-to-peer

Orchestrated fan-out (this guide)

  • Topology: star — one conductor, N isolated workers
  • Direction: one-way. Conductor launches; workers return results
  • Workers know about each other: no — each is a leaf in its own context
  • Best for: embarrassingly-parallel work (a migration across modules, N independent reviews)
  • Failure mode: a slow or stuck worker; you kill_process it and move on
  • Substrate: ai-cli-mcp, claude --bare -p matrix, Codex Cloud parallel tasks

Peer-to-peer delegation (plugin guides)

  • Topology: two (or few) agents as conversational equals
  • Direction: bidirectional — they pass work back and forth
  • Workers know about each other: yes — it is a dialogue
  • Best for: cross-model review, a rescue investigation, a second opinion
  • Failure mode: a ping-pong loop that drains both budgets
  • Substrate: the official codex-plugin-cc, the MCP bridge guides

The conductor's instrument: ai-cli-mcp

To fan out across tools from a single Claude Code session, you need one MCP server that can launch other CLI agents as child processes and harvest them. That server is ai-cli-mcp by mkXultra [V] — an MCP bridge whose entire job is to spawn Claude, Codex, Gemini, Forge, or OpenCode CLIs as background processes and give the orchestrator verbs to manage them by PID.

You add it like any stdio MCP server — npx -y ai-cli-mcp@latest — and it exposes a tight set of process-control tools. Three start and observe work; three manage the lifecycle:

  • run — launch a CLI agent on a prompt and return immediately with its PID. This is the fan-out primitive: call it N times to start N workers.
  • peek — observe running children without blocking, returning structured events (optionally including their tool calls).
  • wait — block until a set of PIDs complete (with an optional timeout). This is the join — how the conductor collects everyone before synthesizing.
  • list_processes — list every running and completed agent process the server is tracking.
  • get_result — pull the current output and status of one process by PID.
  • kill_process — terminate a runaway worker by PID and stop its spend.

Two more verbs are pure hygiene: doctor checks which AI CLI binaries are actually installed, and models lists the supported model names and aliases. Run doctor once before your first fan-out so you are not launching workers against a CLI that is not on PATH.

VerbJobKey params
runLaunch one worker, return its PID immediatelyprompt *(or prompt_file)*, workFolder *(required)*, model, reasoning_effort, session_id
peekObserve running children without blockingpids, peek_time_sec, include_tool_calls
waitBlock until the given PIDs finish — the joinpids, timeout, verbose
list_processesList all running + completed processes*(none)*
get_resultFetch output + status of one processpid, verbose
kill_processTerminate a runaway worker, stop its spendpid
The ai-cli-mcp verbs, grouped by job. run is the fan-out primitive; wait is the join; the rest observe and clean up. Every lifecycle verb addresses a worker by its PID.

run, exactly: the parameters that matter

run is where a fan-out is actually configured, so its parameters are worth knowing cold [V]:

  • prompt — the instruction for the worker (or prompt_file to point at a file instead, which keeps a giant prompt out of the tool call).
  • workFolderrequired. The working directory the worker runs in. This is the single most important field for safe fan-out: give each worker its own directory (a git worktree, a per-module checkout) and two workers can never edit the same file. [P]
  • model — which model the worker uses. This is how you route — a cheap fast model for mechanical edits, a strong one for the hard module.
  • reasoning_effort — the effort level for that worker, so you spend deep reasoning only where it earns its tokens.
  • session_id — continue a specific prior session instead of starting clean, when a worker should build on earlier context.

The binary each worker drives is resolved by name — claude, codex, gemini, and so on. Which brings us to the override that makes mixed fleets and CI sane.

Claude-Code-as-conductor: fan a migration across N Codex workers

Here is the pattern end to end, with Claude Code at the center [P]. You are doing the 24-module rename. Claude Code is the conductor; you let it drive Codex workers — a second model family, a second budget — through ai-cli-mcp. The flow is always the same four beats: fan out → wait → harvest → synthesize.

  1. Fan out. Claude calls run once per module: prompt is the rename instruction scoped to that module, workFolder is that module's own directory (so no two workers share files), model is a Codex model, reasoning_effort set per-module difficulty. Each run returns a PID instantly; Claude collects the 24 PIDs.
  2. Wait. Claude calls wait on the full PID list with a timeout. It blocks here — the only place it blocks — while all 24 Codex workers churn in parallel, each in its own context window. Optionally it peeks mid-flight to surface a stuck worker early.
  3. Harvest. As workers finish, Claude pulls each one's output with get_result. The raw per-worker transcripts never enter Claude's main context — only the structured results do.
  4. Synthesize in the main thread. Claude reads the 24 results in its own session and produces the one thing you actually wanted: a cross-module conflict summary — which renames touched a shared signature, where two modules disagree on the new name, which three modules failed their tests. You see that summary. You never see 24 walls of diff.

The win is the last beat. The expensive, context-polluting work happened 24 times off your context budget. The conductor spent its tokens on the part only it can do: reconciling the workers into a single decision.

Claude Code conducts 24 Codex workers
… scroll to run this session
run × N to fan out, wait to join, get_result to harvest — then Claude synthesizes a cross-module conflict summary in its own context. The per-worker transcripts never touch the main thread.

A second substrate: Codex Cloud parallel tasks and @codex mentions

ai-cli-mcp fans out across local CLIs. Codex Cloud is a second, complementary fan-out substrate that runs workers in OpenAI's own cloud environment — and the official docs are explicit that "Codex can work on tasks in the background (including in parallel) using its own cloud environment" [V]. You hand it a repo, the setup steps, and the tools it should use; it spins up isolated cloud tasks that do not block your laptop at all.

There are two ways to fan out into it [V]:

  • From the Codex web app or your editor. Kick off multiple cloud tasks at once; each runs in its own configured environment, and you "monitor progress and apply the resulting diffs locally" when they land. This is fan-out where the workers live in the cloud and the conductor is you (or your editor), harvesting diffs.
  • From GitHub, by mention. Tag @codex on issues and pull requests to "spin up tasks and propose changes directly from GitHub" [V]. Drop @codex on five issues and you have fanned out five cloud workers from the GitHub UI — each opens its own PR. The conductor here is your issue tracker; the join is your PR review queue.

The trade against local ai-cli-mcp is control vs. convenience. Local workers run on your machine, on your files, with your exact binaries (those *_CLI_NAME overrides). Cloud workers need no local compute and survive your laptop sleeping, but run in an environment you configure once and reach the public internet only if an admin enables it. A mature fan-out often uses both: local workers for the migration you are watching, @codex mentions for the long-tail issues you want chipped away asynchronously. [P]

The CI fan-out leg: claude --bare -p in a matrix

The third substrate needs no MCP server at all — it is Claude Code's headless mode, run as a build-matrix job [P]. This is the fan-out that runs on every push, unattended, and reports structured results a script can act on. It cross-links directly to the autonomy guide, which covers the permission and safety side of running Claude unattended; here we focus on the fan-out mechanics.

The headless invocation that belongs in a CI matrix uses four flags, all verified [V]:

bash
# One matrix leg — runs per shard/module, fully isolated and budgeted
claude --bare -p "Audit this module for missing error handling and report each gap" \
  --worktree "audit-${MODULE}" \
  --output-format json \
  --json-schema '{"type":"object","properties":{"gaps":{"type":"array","items":{"type":"string"}}},"required":["gaps"]}' \
  --max-budget-usd 2.00 \
  | jq '.structured_output'

Each flag earns its place [V]:

  • --bare skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md, so every matrix leg starts fast and — critically — gets the same result on every machine, unaffected by whatever a teammate has in their ~/.claude. It is the recommended mode for scripted and SDK calls.
  • --worktree <name> starts the worker in an isolated git worktree at <repo>/.claude/worktrees/<name>. This is the matrix's version of workFolder: per-leg isolation so parallel legs never touch the same files.
  • --output-format json plus --json-schema force the worker to return output conforming to a schema, in the structured_output field — so the matrix produces machine-readable results your aggregation step (or jq) can merge, not prose you have to re-parse.
  • --max-budget-usd 2.00 caps the dollars one leg can spend on API calls before it stops — the per-worker brake that keeps a 24-leg matrix from a runaway bill (next section).

The fan-out is the matrix itself: your CI defines one leg per module/shard, the runner executes them in parallel, and a final aggregation job collects every leg's structured_output into one report. Same four beats — fan out, wait (the matrix join), harvest (collect JSON), synthesize (the aggregation job) — with the CI runner as conductor.

Run a clean fan-out: the four beats

  1. Fan out — launch isolated workers

    Call run once per task (or define one CI matrix leg per task). Give each worker its own workFolder / --worktree so no two can edit the same file. Set model and reasoning_effort/--effort per task difficulty, and attach a budget brake (--max-budget-usd for headless, a wait timeout for ai-cli-mcp).

  2. Wait — the join

    Call wait on the full PID list with a timeout (or let the CI matrix join). The conductor blocks only here while every worker runs in parallel. peek mid-flight to catch a stuck worker before the timeout, and kill_process anything clearly dead.

  3. Harvest — pull structured results

    Collect each worker's output with get_result (or each matrix leg's structured_output JSON). Prefer --json-schema / structured output so results are machine-mergeable. The raw transcripts stay out of the conductor's context — only the results come in.

  4. Synthesize — the one job only the conductor can do

    Read the harvested results in the conductor's own context and produce the single consolidated answer: a cross-module conflict summary, a merged audit report, a ranked list of failures. Surface that — never the N walls of intermediate output.

Senior scenario: a release-blocking audit across 18 services

It's the day before a release and security needs a written answer to one question across 18 microservices: does any service log a raw auth token? Reading 18 repos yourself is a day. Doing it in one Claude session blows the context window by repo 6 and the answers for the early repos rot. This is textbook fan-out — 18 independent reads, one consolidated verdict — and you want it on a leash because it's 18× the spend.

You make Claude Code the conductor over ai-cli-mcp. It runs 18 workers, one per service repo: workFolder is each repo's own checkout (zero cross-contamination), model is a strong reviewer, prompt_file points every worker at the same audit checklist so they grade identically. It waits on all 18 PIDs with a 20-minute timeout, peeking once at the ten-minute mark — one worker is stuck cloning a huge monorepo, so it kill_processes that PID and re-runs it scoped to the relevant subtree. As results land it get_results each one.

Then the part you actually needed: in its own context, Claude reads 18 structured findings and writes the verdict — "16 services clean. payments-api logs the bearer token at DEBUG in AuthFilter.java:88. notifications logs it inside a serialized request object — sneakier, same risk." Two findings, file-and-line precise, out of 18 repos and roughly 18× the tokens of a single review — spent in parallel in twenty minutes, every worker capped, none of the intermediate noise in your thread. You paste those two lines into the release ticket and go home.

Watch a fan-out run

Orchestrated fan-out

One orchestrator drives every worker; nothing flows worker-to-worker. Pick a substrate per lane, give it a prompt, and hit Run. Each lane fills in parallel at its own pace with a live token and cost meter — then a synthesis node merges the results and surfaces only the cross-module conflicts.

orchestrator

main thread · one-directional fan-out

runwaitpeekget_result
0 tok$0.000

idle · cap $0.500

0 tok$0.000

idle · cap $0.500

0 tok$0.000

idle · cap $0.500

synthesis

Waits for every worker, then merges in the main thread.

The orchestration call

# ai-cli-mcp: fan out, then wait + get_result to synthesize
run(
prompt="Migrate auth/ to the v2 token API",
model="sonnet",
workFolder="./workers/claude-1",
) --max-budget-usd 0.50
run(
prompt="Migrate billing/ to the v2 token API",
model="gpt-5.4",
workFolder="./workers/codex-2",
) --max-budget-usd 0.50
run(
prompt="Migrate webhooks/ to the v2 token API",
model="codex-cloud",
workFolder="./workers/cloud-3",
) --max-budget-usd 0.50
wait() # block until all 3 workers land
get_result() # collect summaries; synthesize in the main thread

Idle. Orchestrator ready to fan out across 3 worker lanes.

Drive a simulated conductor: launch N workers across Claude and Codex, watch them run in parallel in isolated work folders, wait for the join, then collapse their results into one synthesized summary — without spending a real token.

Knowledge check

You need to apply the same mechanical refactor across 30 independent modules, fast, from one Claude Code session, on a tight budget, without the 30 transcripts polluting your main context. Which setup matches orchestrated fan-out?

The shape to build toward

Fan-out is one move you will reuse constantly once it clicks: one conductor, N isolated leaf workers, four beats — fan out, wait, harvest, synthesize. Reach for ai-cli-mcp when you want a Claude Code session to conduct a mixed local fleet of Claude and Codex workers by PID. Reach for claude --bare -p with --worktree, --output-format json, --json-schema, and --max-budget-usd when the fan-out should run unattended in CI and report machine-readable results. Reach for Codex Cloud parallel tasks and @codex GitHub mentions when you want the workers off your machine entirely.

Give every worker its own work folder so none can clobber another, put a budget and a timeout on every one of them, right-size each worker's model and effort, and let the conductor do the only thing it cannot delegate — turning N results into one decision. Keep the intermediate noise on the spokes; keep only the synthesis at the center. That is the whole discipline.

Reach the end and this star joins your charted sky.