The Navigator · 12 min mission

The Agent SDK: Claude Code as a Library

Drive the same agent loop from Python or TypeScript — typed messages, programmatic permissions, hosting.

sdkautomationadvancedFact-checked 2026-06-13

On this page

The query() mental model
ClaudeAgentOptions: the whole agent in one object
The message stream: what you iterate
query() vs ClaudeSDKClient: one-shot vs conversation
Runtime control: canUseTool and programmatic hooks
Hosting: it's processes, not a stateless API
Senior scenario: a multi-tenant SDK service
Where to go from here

The terminal is a UI. Underneath it, Claude Code is a loop: read the prompt, decide on a tool, run it, read the result, decide again — until the work is done. The Agent SDK hands you that exact loop as a library. Same tools, same agent loop, same context management that power the CLI — now callable from Python or TypeScript, inside your own process, with no human watching the terminal.

This is the difference between using an agent and shipping one. A for loop over a list of pull requests, each reviewed by its own Claude agent. A web endpoint that spins up a scoped agent per customer. A CI job that fixes the failing test and pushes the patch. The CLI is for you at your desk; the SDK is for your servers at 3 a.m.

The query() mental model

There is one function you start with, and it is an async generator. You give it a prompt, you iterate the messages it yields, and the loop runs to completion on its own — Claude picks tools, executes them, and streams back everything that happens.

import { query } from "@anthropic-ai/claude-agent-sdk";
 
for await (const message of query({
  prompt: "Find and fix the bug in auth.ts",
  options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
  console.log(message); // Claude reads the file, finds the bug, edits it
}

Internally this is not the raw Messages API. With the Anthropic Client SDK you write the tool loop yourself — while (response.stop_reason === "tool_use"), execute the tool, feed the result back, repeat [V]. The Agent SDK is that loop, already written, with file-reading, shell, search, and edit tools built in [V]. You describe the goal; the SDK runs the round-trips.

ClaudeAgentOptions: the whole agent in one object

Everything that shapes a run lives in options — ClaudeAgentOptions in Python, the options object in TypeScript. The field names differ only by casing: Python is snake_case (allowed_tools), TypeScript is camelCase (allowedTools). The same five fields carry most of the weight.

allowedTools / allowed_tools is a list of tool names to auto-approve without prompting — ["Read", "Glob", "Grep"] for a read-only analyst. permissionMode / permission_mode sets the global stance: "default", "acceptEdits", "plan", "dontAsk", or "bypassPermissions" [V]. mcpServers / mcp_servers connects external systems over the Model Context Protocol — the same servers you'd add in the CLI, configured inline. agents defines subagents programmatically (a Record / dict of name → definition) so the main agent can delegate. And resume takes a session ID to continue a previous conversation with full context — files read, analysis done, history intact [V].

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
 
async def main():
    async for message in query(
        prompt="Review this codebase with the code-reviewer agent",
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Glob", "Grep", "Agent"],
            permission_mode="default",
            agents={
                "code-reviewer": AgentDefinition(
                    description="Expert reviewer for quality and security.",
                    prompt="Analyze code quality and suggest improvements.",
                    tools=["Read", "Glob", "Grep"],
                )
            },
        ),
    ):
        if hasattr(message, "result"):
            print(message.result)
 
asyncio.run(main())

One sharp detail: subagents are invoked through the Agent tool, so you must include "Agent" in allowed_tools for those invocations to be auto-approved [V]. Forget it and the delegation prompts (or, in dontAsk, dies).

Python	TypeScript	What it does
`allowed_tools`	`allowedTools`	List of tool names auto-approved without prompting
`permission_mode`	`permissionMode`	`default` · `acceptEdits` · `plan` · `dontAsk` · `bypassPermissions`
`mcp_servers`	`mcpServers`	Connect external systems over MCP, configured inline
`agents`	`agents`	Define subagents programmatically (name → `AgentDefinition`)
`resume`	`resume`	Session ID to resume a prior conversation with full context
`can_use_tool`	`canUseTool`	Runtime callback that approves/denies each tool call

The load-bearing ClaudeAgentOptions fields. Python is snake_case, TypeScript camelCase; the meaning is identical.

The message stream: what you iterate

Every value the generator yields is a typed message. You don't have to handle all of them, but two matter on day one.

The assistant message (SDKAssistantMessage in TS, AssistantMessage in Python) carries Claude's text and tool-use blocks as the turn unfolds. Crucially, messages produced inside a subagent carry a parent_tool_use_id field, so you can attribute each message to the subagent run that produced it [V] — essential when three subagents stream at once.

The result message is the last one, and it is where you read the outcome. ResultMessage / SDKResultMessage carries result (the final text), total_cost_usd (the cost of the whole run), usage (token counts), num_turns, and a subtype that is "success" or an error like "error_max_turns" or "error_max_budget_usd" [V]. If you asked Claude for a typed answer, it lands in structured_output — parsed data, not a string you regex [V]. Reading total_cost_usd off the result is the simplest cost meter you will ever wire up.

for await (const message of q) {
  if (message.type === "result") {
    console.log(`Cost: $${message.total_cost_usd}`);
    console.log(`Tokens out: ${message.usage.output_tokens}`);
    console.log(`Result: ${message.result}`);
  }
}

one query() run, end to end

… scroll to run this session

A one-shot agent fixes a bug, then the result message hands you the cost and token counts with no extra accounting.

query() vs ClaudeSDKClient: one-shot vs conversation

query() is stateless by design — each call is a fresh session. That's perfect for a task that begins and ends: fix this bug, extract this invoice, translate this doc. To carry context across turns you have two routes.

The first is resume: capture the session ID from the first run's init system message, then pass it as options.resume on the next query(). Claude reloads the full transcript and continues. The second — Python's ergonomic path for ongoing chat — is ClaudeSDKClient, which holds one session open across many exchanges. You await client.query(...), iterate client.receive_response(), then ask a follow-up that remembers the first answer. ClaudeSDKClient also supports interrupt() to stop a running turn; plain query() does not [V].

async with ClaudeSDKClient() as client:
    await client.query("What's the capital of France?")
    async for msg in client.receive_response():
        ...  # "Paris"
 
    # Same session — "that city" resolves to Paris
    await client.query("What's the population of that city?")
    async for msg in client.receive_response():
        ...

Pick the entry point by session shape

query()

New session every call. One exchange, then done.

No memory between calls (carry it with resume)
No interrupt()
Iterate the async generator directly
Right for: one-off tasks, CI jobs, fan-out over a list, ephemeral containers

ClaudeSDKClient (Python)

One session, many turns. Context persists automatically.

Follow-ups remember prior turns via receive_response()
interrupt() supported mid-turn
Async context manager: async with ClaudeSDKClient() as client
Right for: chat bots, long-running assistants, interactive tools

Runtime control: canUseTool and programmatic hooks

allowedTools is a static allow-list. Real services need a decision at the moment of the call — block writes to /etc, redact a path, deny a shell command that matches a pattern. That's canUseTool (Python can_use_tool): a callback the SDK invokes for any tool call not already resolved by rules, with the tool name and its input. You return a verdict.

canUseTool: async (toolName, input, { signal, toolUseID }) => {
  if (toolName === "Write" && String(input.path).includes("/sensitive/")) {
    return { behavior: "deny", message: "Cannot write to sensitive paths" };
  }
  return { behavior: "allow" };  // optionally with updatedInput to rewrite the call
}

The return shape is exact: { behavior: "allow", updatedInput? } to permit (and optionally rewrite the tool input), or { behavior: "deny", message, interrupt? } to block with a reason Claude sees [V]. Hooks are the other lever — callbacks bound to lifecycle events (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, and more) for logging, auditing, or transforming behavior [V]. A PostToolUse hook matched on Edit|Write is how you write an audit log of every file Claude touched.

Order matters. The SDK evaluates permissions in a fixed sequence: hooks → deny rules → ask rules → permission mode → allow rules → canUseTool [V]. A deny rule blocks a tool even in bypassPermissions mode; allowedTools does not constrain bypassPermissions (that mode approves everything that reaches it) [V]. If you need bypass speed but must block rm, use disallowedTools: ["Bash(rm *)"], not an allow-list.

Hosting: it's processes, not a stateless API

Here is the fact that reorganizes every production decision: calling query() spawns a separate claude CLI subprocess and talks to it over stdio [V]. That subprocess owns a shell, a working directory, and the JSONL session transcript on local disk. One agent session maps to one subprocess; N concurrent sessions means N process trees, each with its own transcript file [V].

Two consequences fall out immediately. First, state is local and ephemeral — session transcripts live in ~/.claude/projects/ (or under CLAUDE_CONFIG_DIR) and do not survive a container restart, scale-down, or node move [V]. To persist a session a user expects to resume, attach a SessionStore adapter (S3, Redis, Postgres, or your own) via the sessionStore / session_store option; it mirrors transcripts to durable storage so an ephemeral container can hydrate by ID on the next request [V]. Caveat worth internalizing: SessionStore mirrors transcripts only — not CLAUDE.md memory files or working-directory artifacts, which need their own volume or object-store sync [V].

Second, concurrency is bounded by RAM, because each session is a real process. The docs give a starting point of ~1 GiB RAM, 5 GiB disk, 1 CPU per agent, and a sizing formula: agents per host = (host RAM − overhead) / per-session RAM ceiling [V]. For long-running sessions you run a pool of containers behind a load balancer and pin each session to one container using consistent hashing on sessionId, so a resumed session keeps hitting the same live subprocess [V].

Senior scenario: a multi-tenant SDK service

You're building a hosted product: every customer gets a Claude agent that works against their data, in their sandbox, and must never see another tenant's context. The naive setup leaks, because default SDK behavior reads settings.json and CLAUDE.md memory from the shared filesystem — one tenant's project memory can bleed into another's system prompt [V]. Here is the documented isolation recipe, applied per query() call.

for await (const message of query({
  prompt,
  options: {
    cwd: tenantDir,           // per-tenant working directory, unique per customer
    settingSources: [],       // load NO filesystem settings/CLAUDE.md
    resume: sessionId,        // looked up from your DB by this user
    sessionStore,             // durable transcript store, keyed per tenant
    env: {
      ...process.env,         // keep PATH, ANTHROPIC_API_KEY — env REPLACES it in TS
      CLAUDE_CONFIG_DIR: configDir,        // per-tenant config dir, not shared ~/.claude.json
      CLAUDE_CODE_DISABLE_AUTO_MEMORY: "1" // auto-memory loads regardless of settingSources
    }
  }
})) { /* ... */ }

Four SDK-level moves do the isolation [V]: settingSources: [] so no shared filesystem config loads; CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 because [auto memory] loads into the system prompt even when settingSources is empty; CLAUDE_CONFIG_DIR pointed at a per-tenant path so tenants don't share the global ~/.claude.json; and an explicit per-tenant cwd on every call. Then the operational layer: put auth at a gateway in front of the container — the agent should receive pre-authenticated requests and never validate user tokens itself [V] — and route outbound tool calls through an egress proxy that injects credentials after the request leaves the container, so tool secrets never live in the agent's environment and a compromised tenant can't exfiltrate via another's outbound policy [V].

For visibility across all of it, the SDK inherits OpenTelemetry config from the environment: set CLAUDE_CODE_ENABLE_TELEMETRY=1 plus the standard OTEL_* exporter variables at the container level and every query() exports spans, metrics, and logs to your collector — prompt text and tool inputs are excluded by default unless you opt in [V]. Two honest limits to design around: there is no top-level session timeout (bound runs with maxTurns), and large parallel-subagent fan-outs can hit API rate limits — batch the work rather than firing one wide dispatch [V].

Stand up a tenant-scoped SDK session

Authenticate at the edge, not the agent
Terminate user auth at a gateway in front of the container. The agent process receives pre-authenticated requests and never sees raw user tokens [V]. Supply ANTHROPIC_API_KEY from your secret manager — or route model calls through a proxy via ANTHROPIC_BASE_URL so the key lives outside the container [V].
Scope the filesystem and config per tenant
Set settingSources: [], CLAUDE_CONFIG_DIR to a per-tenant directory, CLAUDE_CODE_DISABLE_AUTO_MEMORY=1, and an explicit cwd. These four close the documented cross-tenant context leaks [V].
Persist and resume by session ID
Attach a SessionStore (S3/Redis/Postgres) so ephemeral containers hydrate transcripts on resume. Look the sessionId up from your DB by user, pass it as resume, and pin routing with consistent hashing on sessionId [V]. Alert on mirror_error system messages if store durability matters [V].
Bound, meter, and observe
Set maxTurns (there is no auto session timeout). Read total_cost_usd off each result for per-session cost; hard-cap to fail with error_max_budget_usd if needed. Export OTEL_* telemetry with CLAUDE_CODE_ENABLE_TELEMETRY=1 for traces, metrics, and logs [V].

Compose an agent run

Watch delegation happen

The orchestrator hands a slice of work to each subagent. Every subagent runs in its own context window, does the noisy part — searching, reviewing, running tests — and returns only a short summary. Dispatch them and watch the work fan out, then the results pulse home.

orchestrator · main thread

ready

The roster

exploreridle

Searches and maps the codebase without editing.

model · Haiku 4.5

revieweridle

Read-only pass for bugs, style, and risk.

model · Sonnet 4.6

testeridle

Runs the suite and reports failures.

model · Haiku 4.5

implementeridle

Writes the focused change end to end.

model · Opus 4.8

Wire up options, subagents, and the message stream, and watch how the agent loop and result message take shape.

Knowledge check

Your multi-tenant SDK service runs many customers through one shared container. A tenant reports seeing hints of another tenant’s project in its agent’s answers. Which single change most directly addresses the documented leak?

Where to go from here

The arc is short: query() for a one-shot task, ClaudeSDKClient or resume for a conversation, canUseTool and hooks for runtime control, and the hosting playbook — subprocess-per-session, SessionStore for durability, per-tenant isolation, OTEL for sight — the moment it goes to production. Prototype locally with the SDK; if you'd rather not operate the sandbox and session infrastructure yourself, the same agent maps onto Managed Agents, the hosted REST API where Anthropic runs the loop for you [V].

The shift in posture is the whole point. You stop thinking "what should I type next?" and start thinking "what should this service do, unattended, ten thousand times?" The loop is the same loop. You just gave it to a server.

Reach the end and this star joins your charted sky.