The Navigator · 12 min mission
The Agent SDK: Claude Code as a Library
Drive the same agent loop from Python or TypeScript — typed messages, programmatic permissions, hosting.
On this page
- The query() mental model
- ClaudeAgentOptions: the whole agent in one object
- The message stream: what you iterate
- query() vs ClaudeSDKClient: one-shot vs conversation
- Runtime control: canUseTool and programmatic hooks
- Hosting: it's processes, not a stateless API
- Senior scenario: a multi-tenant SDK service
- Where to go from here
The terminal is a UI. Underneath it, Claude Code is a loop: read the prompt, decide on a tool, run it, read the result, decide again — until the work is done. The Agent SDK hands you that exact loop as a library. Same tools, same agent loop, same context management that power the CLI — now callable from Python or TypeScript, inside your own process, with no human watching the terminal.
This is the difference between using an agent and shipping one. A for loop over a list of pull requests, each reviewed by its own Claude agent. A web endpoint that spins up a scoped agent per customer. A CI job that fixes the failing test and pushes the patch. The CLI is for you at your desk; the SDK is for your servers at 3 a.m.
The query() mental model
There is one function you start with, and it is an async generator. You give it a prompt, you iterate the messages it yields, and the loop runs to completion on its own — Claude picks tools, executes them, and streams back everything that happens.
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.ts",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
console.log(message); // Claude reads the file, finds the bug, edits it
}Internally this is not the raw Messages API. With the Anthropic Client SDK you write the tool loop yourself — while (response.stop_reason === "tool_use"), execute the tool, feed the result back, repeat [V]. The Agent SDK is that loop, already written, with file-reading, shell, search, and edit tools built in [V]. You describe the goal; the SDK runs the round-trips.
ClaudeAgentOptions: the whole agent in one object
Everything that shapes a run lives in options — ClaudeAgentOptions in Python, the options object in TypeScript. The field names differ only by casing: Python is snake_case (allowed_tools), TypeScript is camelCase (allowedTools). The same five fields carry most of the weight.
allowedTools / allowed_tools is a list of tool names to auto-approve without prompting — ["Read", "Glob", "Grep"] for a read-only analyst. permissionMode / permission_mode sets the global stance: "default", "acceptEdits", "plan", "dontAsk", or "bypassPermissions" [V]. mcpServers / mcp_servers connects external systems over the Model Context Protocol — the same servers you'd add in the CLI, configured inline. agents defines subagents programmatically (a Record / dict of name → definition) so the main agent can delegate. And resume takes a session ID to continue a previous conversation with full context — files read, analysis done, history intact [V].
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
async def main():
async for message in query(
prompt="Review this codebase with the code-reviewer agent",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep", "Agent"],
permission_mode="default",
agents={
"code-reviewer": AgentDefinition(
description="Expert reviewer for quality and security.",
prompt="Analyze code quality and suggest improvements.",
tools=["Read", "Glob", "Grep"],
)
},
),
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())One sharp detail: subagents are invoked through the Agent tool, so you must include "Agent" in allowed_tools for those invocations to be auto-approved [V]. Forget it and the delegation prompts (or, in dontAsk, dies).
| Python | TypeScript | What it does |
|---|---|---|
allowed_tools | allowedTools | List of tool names auto-approved without prompting |
permission_mode | permissionMode | default · acceptEdits · plan · dontAsk · bypassPermissions |
mcp_servers | mcpServers | Connect external systems over MCP, configured inline |
agents | agents | Define subagents programmatically (name → AgentDefinition) |
resume | resume | Session ID to resume a prior conversation with full context |
can_use_tool | canUseTool | Runtime callback that approves/denies each tool call |
The message stream: what you iterate
Every value the generator yields is a typed message. You don't have to handle all of them, but two matter on day one.
The assistant message (SDKAssistantMessage in TS, AssistantMessage in Python) carries Claude's text and tool-use blocks as the turn unfolds. Crucially, messages produced inside a subagent carry a parent_tool_use_id field, so you can attribute each message to the subagent run that produced it [V] — essential when three subagents stream at once.
The result message is the last one, and it is where you read the outcome. ResultMessage / SDKResultMessage carries result (the final text), total_cost_usd (the cost of the whole run), usage (token counts), num_turns, and a subtype that is "success" or an error like "error_max_turns" or "error_max_budget_usd" [V]. If you asked Claude for a typed answer, it lands in structured_output — parsed data, not a string you regex [V]. Reading total_cost_usd off the result is the simplest cost meter you will ever wire up.
for await (const message of q) {
if (message.type === "result") {
console.log(`Cost: $${message.total_cost_usd}`);
console.log(`Tokens out: ${message.usage.output_tokens}`);
console.log(`Result: ${message.result}`);
}
}query() vs ClaudeSDKClient: one-shot vs conversation
query() is stateless by design — each call is a fresh session. That's perfect for a task that begins and ends: fix this bug, extract this invoice, translate this doc. To carry context across turns you have two routes.
The first is resume: capture the session ID from the first run's init system message, then pass it as options.resume on the next query(). Claude reloads the full transcript and continues. The second — Python's ergonomic path for ongoing chat — is ClaudeSDKClient, which holds one session open across many exchanges. You await client.query(...), iterate client.receive_response(), then ask a follow-up that remembers the first answer. ClaudeSDKClient also supports interrupt() to stop a running turn; plain query() does not [V].
async with ClaudeSDKClient() as client:
await client.query("What's the capital of France?")
async for msg in client.receive_response():
... # "Paris"
# Same session — "that city" resolves to Paris
await client.query("What's the population of that city?")
async for msg in client.receive_response():
...Pick the entry point by session shape
query()
New session every call. One exchange, then done.
- No memory between calls (carry it with
resume) - No
interrupt() - Iterate the async generator directly
- Right for: one-off tasks, CI jobs, fan-out over a list, ephemeral containers
ClaudeSDKClient (Python)
One session, many turns. Context persists automatically.
- Follow-ups remember prior turns via
receive_response() interrupt()supported mid-turn- Async context manager:
async with ClaudeSDKClient() as client - Right for: chat bots, long-running assistants, interactive tools
Runtime control: canUseTool and programmatic hooks
allowedTools is a static allow-list. Real services need a decision at the moment of the call — block writes to /etc, redact a path, deny a shell command that matches a pattern. That's canUseTool (Python can_use_tool): a callback the SDK invokes for any tool call not already resolved by rules, with the tool name and its input. You return a verdict.
canUseTool: async (toolName, input, { signal, toolUseID }) => {
if (toolName === "Write" && String(input.path).includes("/sensitive/")) {
return { behavior: "deny", message: "Cannot write to sensitive paths" };
}
return { behavior: "allow" }; // optionally with updatedInput to rewrite the call
}The return shape is exact: { behavior: "allow", updatedInput? } to permit (and optionally rewrite the tool input), or { behavior: "deny", message, interrupt? } to block with a reason Claude sees [V]. Hooks are the other lever — callbacks bound to lifecycle events (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, and more) for logging, auditing, or transforming behavior [V]. A PostToolUse hook matched on Edit|Write is how you write an audit log of every file Claude touched.
Order matters. The SDK evaluates permissions in a fixed sequence: hooks → deny rules → ask rules → permission mode → allow rules → canUseTool [V]. A deny rule blocks a tool even in bypassPermissions mode; allowedTools does not constrain bypassPermissions (that mode approves everything that reaches it) [V]. If you need bypass speed but must block rm, use disallowedTools: ["Bash(rm *)"], not an allow-list.
Hosting: it's processes, not a stateless API
Here is the fact that reorganizes every production decision: calling query() spawns a separate claude CLI subprocess and talks to it over stdio [V]. That subprocess owns a shell, a working directory, and the JSONL session transcript on local disk. One agent session maps to one subprocess; N concurrent sessions means N process trees, each with its own transcript file [V].
Two consequences fall out immediately. First, state is local and ephemeral — session transcripts live in ~/.claude/projects/ (or under CLAUDE_CONFIG_DIR) and do not survive a container restart, scale-down, or node move [V]. To persist a session a user expects to resume, attach a SessionStore adapter (S3, Redis, Postgres, or your own) via the sessionStore / session_store option; it mirrors transcripts to durable storage so an ephemeral container can hydrate by ID on the next request [V]. Caveat worth internalizing: SessionStore mirrors transcripts only — not CLAUDE.md memory files or working-directory artifacts, which need their own volume or object-store sync [V].
Second, concurrency is bounded by RAM, because each session is a real process. The docs give a starting point of ~1 GiB RAM, 5 GiB disk, 1 CPU per agent, and a sizing formula: agents per host = (host RAM − overhead) / per-session RAM ceiling [V]. For long-running sessions you run a pool of containers behind a load balancer and pin each session to one container using consistent hashing on sessionId, so a resumed session keeps hitting the same live subprocess [V].
Senior scenario: a multi-tenant SDK service
You're building a hosted product: every customer gets a Claude agent that works against their data, in their sandbox, and must never see another tenant's context. The naive setup leaks, because default SDK behavior reads settings.json and CLAUDE.md memory from the shared filesystem — one tenant's project memory can bleed into another's system prompt [V]. Here is the documented isolation recipe, applied per query() call.
for await (const message of query({
prompt,
options: {
cwd: tenantDir, // per-tenant working directory, unique per customer
settingSources: [], // load NO filesystem settings/CLAUDE.md
resume: sessionId, // looked up from your DB by this user
sessionStore, // durable transcript store, keyed per tenant
env: {
...process.env, // keep PATH, ANTHROPIC_API_KEY — env REPLACES it in TS
CLAUDE_CONFIG_DIR: configDir, // per-tenant config dir, not shared ~/.claude.json
CLAUDE_CODE_DISABLE_AUTO_MEMORY: "1" // auto-memory loads regardless of settingSources
}
}
})) { /* ... */ }Four SDK-level moves do the isolation [V]: settingSources: [] so no shared filesystem config loads; CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 because [auto memory] loads into the system prompt even when settingSources is empty; CLAUDE_CONFIG_DIR pointed at a per-tenant path so tenants don't share the global ~/.claude.json; and an explicit per-tenant cwd on every call. Then the operational layer: put auth at a gateway in front of the container — the agent should receive pre-authenticated requests and never validate user tokens itself [V] — and route outbound tool calls through an egress proxy that injects credentials after the request leaves the container, so tool secrets never live in the agent's environment and a compromised tenant can't exfiltrate via another's outbound policy [V].
For visibility across all of it, the SDK inherits OpenTelemetry config from the environment: set CLAUDE_CODE_ENABLE_TELEMETRY=1 plus the standard OTEL_* exporter variables at the container level and every query() exports spans, metrics, and logs to your collector — prompt text and tool inputs are excluded by default unless you opt in [V]. Two honest limits to design around: there is no top-level session timeout (bound runs with maxTurns), and large parallel-subagent fan-outs can hit API rate limits — batch the work rather than firing one wide dispatch [V].
Stand up a tenant-scoped SDK session
Authenticate at the edge, not the agent
Terminate user auth at a gateway in front of the container. The agent process receives pre-authenticated requests and never sees raw user tokens [V]. Supply
ANTHROPIC_API_KEYfrom your secret manager — or route model calls through a proxy viaANTHROPIC_BASE_URLso the key lives outside the container [V].Scope the filesystem and config per tenant
Set
settingSources: [],CLAUDE_CONFIG_DIRto a per-tenant directory,CLAUDE_CODE_DISABLE_AUTO_MEMORY=1, and an explicitcwd. These four close the documented cross-tenant context leaks [V].Persist and resume by session ID
Attach a
SessionStore(S3/Redis/Postgres) so ephemeral containers hydrate transcripts on resume. Look thesessionIdup from your DB by user, pass it asresume, and pin routing with consistent hashing onsessionId[V]. Alert onmirror_errorsystem messages if store durability matters [V].Bound, meter, and observe
Set
maxTurns(there is no auto session timeout). Readtotal_cost_usdoff each result for per-session cost; hard-cap to fail witherror_max_budget_usdif needed. ExportOTEL_*telemetry withCLAUDE_CODE_ENABLE_TELEMETRY=1for traces, metrics, and logs [V].
Compose an agent run
Watch delegation happen
The orchestrator hands a slice of work to each subagent. Every subagent runs in its own context window, does the noisy part — searching, reviewing, running tests — and returns only a short summary. Dispatch them and watch the work fan out, then the results pulse home.
The roster
Searches and maps the codebase without editing.
model · Haiku 4.5
Read-only pass for bugs, style, and risk.
model · Sonnet 4.6
Runs the suite and reports failures.
model · Haiku 4.5
Writes the focused change end to end.
model · Opus 4.8
Idle. Four subagents waiting for the orchestrator to dispatch work.
Knowledge check
Your multi-tenant SDK service runs many customers through one shared container. A tenant reports seeing hints of another tenant’s project in its agent’s answers. Which single change most directly addresses the documented leak?
Where to go from here
The arc is short: query() for a one-shot task, ClaudeSDKClient or resume for a conversation, canUseTool and hooks for runtime control, and the hosting playbook — subprocess-per-session, SessionStore for durability, per-tenant isolation, OTEL for sight — the moment it goes to production. Prototype locally with the SDK; if you'd rather not operate the sandbox and session infrastructure yourself, the same agent maps onto Managed Agents, the hosted REST API where Anthropic runs the loop for you [V].
The shift in posture is the whole point. You stop thinking "what should I type next?" and start thinking "what should this service do, unattended, ten thousand times?" The loop is the same loop. You just gave it to a server.
Reach the end and this star joins your charted sky.