First Principles · 11 min mission

Context & Prompt Engineering for Coding Agents

Treat the context window as a finite budget and learn the tool-agnostic craft that makes every coding agent sharper.

context-engineeringprompt-engineeringfoundationsmcpagentsFact-checked 2026-06-15
On this page

Context engineering is the practice of curating the exact set of tokens a model sees during inference — system prompt, tool definitions, memory files, retrieved files, tool results, and prior turns — not just the prompt you type. The concrete moves: budget the window, write prompts at the right altitude, design token-efficient tools, ground just-in-time over MCP, decompose into durable artifacts, and keep long sessions coherent. Every technique is tool-agnostic, with the exact commands, config keys, and version pins for Claude Code, Codex, Gemini CLI, and IDE agents.

TermAnthropic's definitionScope
Prompt engineering"Methods for writing and organizing LLM instructions for optimal outcomes"The instructions you author
Context engineering"The set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference"Every token in the window, including those that arrive outside your prompt
Anthropic's two terms. Prompt engineering is the narrow craft; context engineering is the superset that governs every token in the window.
What lands in the windowCost profileHow to keep it lean
System prompt / instructionsFixed — every turnAim for the "right altitude"; remove contradictions
Tool definitionsFixed — every turnConsolidate and namespace; cut overlapping tools
Memory file (CLAUDE.md, AGENTS.md, GEMINI.md)Fixed — every turnLayered and curated, not a junk drawer
Retrieved files / pasted dataVariable — grows fastPrefer just-in-time retrieval over up-front dumps
Tool results + prior turnsVariable — grows fastestClear stale results; compact long sessions
The five claimants on the window. Fixed overhead is paid on every inference; variable cost grows as the session runs.

Write the system prompt at the right altitude

Instructions fail in two directions. Too brittle: hardcoded if-this-then-that logic that breaks the moment reality deviates from the script. Too vague: lofty guidance with no concrete behavioral signal to steer on. The target is the right altitude between them — specific enough to steer, flexible enough to hand the model heuristics rather than a flowchart. Structure the prompt into sections with XML tags or Markdown headers (<background_information>, <instructions>, <context_gathering>, <persistence>, or a ## Tool guidance header) so each part is findable and weightable.

The same instruction at three altitudes

Too brittle / too vague

Brittle: "If the file ends in .test.ts, run jest; if it ends in .spec.ts, run vitest; if there is a Makefile, run make test; otherwise…" — a flowchart that breaks on the first unforeseen case.

Vague: "Write good, well-tested code." — true and unactionable. No signal to steer on.

Right altitude

"Match the existing test setup. Find how tests are run in this repo (check package.json scripts and any Makefile), run the relevant suite before claiming a fix works, and report the command you used."

Concrete enough to steer, flexible enough to survive an unanticipated layout.

Tool-definition hygiene (Anthropic, writing-tools-for-agents)

  1. Consolidate, don't proliferate

    One tool that performs several related operations under the hood beats ten near-duplicates. It shrinks the fixed-overhead definitions and removes the "which tool do I call?" ambiguity that stalls agents.

  2. Namespace related tools under a common prefix

    Group tools with a shared prefix (e.g. issues_*) so their boundaries are obvious.

  3. Return high-signal results

    Prefer human-interpretable names over low-level identifiers like uuid, 256px_image_url, or mime_type. A result the model can read is a result it can use.

  4. Add a verbosity control

    Offer a response_format enum ("concise" vs "detailed"), plus pagination, range selection, filtering, and truncation with sensible defaults, so the agent spends tokens only when it needs detail.

  5. Write error messages that steer

    Emit messages that guide the model toward correct usage, not cryptic codes.

  6. Name semantically and design evaluation-first

    Make names and arguments "as semantically correct as possible" — semantic_search beats an ambiguous search (OpenAI). A tool wrapping a terminal command works best when its output mirrors the real command (keeps the model in-distribution). Measure tools on real tasks, then refine descriptions from what you observe.

High-signal vs low-signal tool definition (TypeScript SDK)
typescript
// LOW-SIGNAL: vague name, dumps raw rows, no verbosity control, no limits.
server.registerTool(
  'search',
  { description: 'Search', inputSchema: { q: z.string() } },
  async ({ q }) => ({
    // Returns every column, including uuids and internal mime types,
    // for an unbounded number of rows — straight into the window.
    content: [{ type: 'text', text: JSON.stringify(await db.searchAll(q)) }],
  }),
);
 
// HIGH-SIGNAL: semantic name, bounded results, human-readable fields,
// and a verbosity knob the agent controls.
server.registerTool(
  'semantic_search_issues',
  {
    description:
      'Search issues by meaning. Returns the top matches with title, ' +
      'status, and assignee. Use detail="full" only when you need bodies.',
    inputSchema: {
      query: z.string().describe('Natural-language description of the issue'),
      limit: z.number().int().min(1).max(20).default(5),
      detail: z.enum(['concise', 'full']).default('concise'),
    },
  },
  async ({ query, limit, detail }) => {
    const hits = await issues.semanticSearch(query, limit);
    const rows = hits.map((i) =>
      detail === 'full'
        ? `#${i.number} [${i.status}] ${i.title} — ${i.assignee}\n${i.body}`
        : `#${i.number} [${i.status}] ${i.title} — ${i.assignee}`,
    );
    return { content: [{ type: 'text', text: rows.join('\n') }] };
  },
);

Ground just-in-time — don't pre-load the world

There are three retrieval postures; the modern agentic default has shifted to the second. Pre-retrieval (up-front) loads data into context before inference (classic RAG with embeddings/indexes) — fast, but risks stale indexes and window pollution. Just-in-time ("agentic search") keeps only lightweight identifiers in context (file paths, URLs, queries) and loads the data at runtime with tools like glob, grep, and file reads — turning metadata into signal and enabling progressive disclosure. Hybrid does both: retrieve a little up front, then explore. Claude Code is the canonical hybrid — CLAUDE.md is dropped in up front while glob/grep/read fetch files just-in-time, sidestepping stale indexing.

PostureWhat sits in contextTrade-off
Pre-retrieval (up-front RAG)Data loaded before inference via embeddings/indexesFast; risks stale indexes and pollution
Just-in-time (agentic search)Identifiers only (glob/grep/read at runtime)Lean window, progressive disclosure; needs tool calls
HybridCLAUDE.md up front + tools for the rest"The decision boundary for the right level of autonomy depends on the task"
The three retrieval postures and when each applies.

Two ways to give an agent the same codebase

Pre-load everything

Paste 4,000 lines across twelve files into the prompt "so it has full context."

The relevant 40 lines drown in 3,960 irrelevant ones. Attention stretches thin, the window is mostly spent before work begins, and you pay for all of it every turn.

Just-in-time retrieval

Point the agent at the directory and let it grep for the symbol, read the two files that matter, and open more only if needed.

The window holds high-signal tokens: a query, two files, the result. Progressive disclosure keeps attention concentrated.

MCP: the wire format for what context an agent can reach

The Model Context Protocol (MCP) standardizes how an agent discovers and pulls external context, making "what context does this agent have access to" portable across clients. It defines three server-exposed primitives that map onto the ways context enters a window: Resources (server-exposed data the client loads into context), Tools (model-controlled functions the agent calls to act or retrieve), and Prompts (server-exposed, user-controlled templates — discovered via prompts/list, fetched via prompts/get with arguments, commonly surfaced as slash commands). A prompt can embed resources directly (text, image, audio, or a type:"resource" reference), injecting documentation or code samples into the conversation flow.

ItemCurrent valueNotes
Spec revision2025-11-25Supersedes 2025-06-18; 2026-07-28 is a release candidate only
Standard transportsstdio, Streamable HTTPHTTP+SSE from 2024-11-05 is deprecated (back-compat only)
Protocol headersMCP-Protocol-Version: 2025-11-25Session id MCP-Session-Id; resume via Last-Event-ID on HTTP GET
Python SDKmcp v1.27.2PyPI 2026-05-29; v2.0.0 in alpha (no GA date)
TypeScript SDK@modelcontextprotocol/sdk v1.29.0npm latest; v2 pre-alpha, targeting Q3 2026
MCP version pins as of 2026-06-15. Build on the v1 SDK lines; v2 is pre-alpha.
Minimal MCP server: one resource + one prompt (Python SDK, mcp v1.27.2)
python
# pip install "mcp[cli]"  — package name is "mcp", stable line v1.x (v1.27.2)
from mcp.server.fastmcp import FastMCP
 
mcp = FastMCP("project-context")
 
 
# A RESOURCE: server-exposed data the client can load into the window
# on demand, identified by a lightweight URI — the just-in-time pattern.
@mcp.resource("docs://coding-standards")
def coding_standards() -> str:
    """The team's coding standards, fetched only when referenced."""
    return (
        "- Prefer composition over inheritance.\n"
        "- Every public function has a docstring and a test.\n"
        "- No new dependencies without sign-off."
    )
 
 
# A PROMPT: a user-controlled template, discovered via prompts/list and
# fetched via prompts/get. Clients commonly surface these as slash commands.
@mcp.prompt()
def review_diff(diff: str) -> str:
    """Structured code-review prompt the user invokes with arguments."""
    return (
        "<task>Review the following diff against our coding standards.</task>\n"
        "<focus>Correctness, then readability. Cite the standard you apply.</focus>\n"
        f"<diff>\n{diff}\n</diff>"
    )
 
 
if __name__ == "__main__":
    # stdio is one of the two standard transports (the other is Streamable HTTP).
    mcp.run(transport="stdio")

Decompose into durable artifacts, not an ever-growing chat

In a single sprawling conversation the chat history is your only memory, and it rots as it grows. Decomposition that matters for context engineering produces durable artifacts — each phase writes a high-signal file that becomes the input to the next, instead of relying on accumulated chat. Two vendor-official methodologies make this concrete: GitHub Spec Kit (a /speckit.* slash-command workflow) and AWS Kiro (three files per feature under .kiro/specs/).

Install GitHub Spec Kit and run the spec-driven flow

  1. Install the CLI

    Requires uv + Python 3.11+. Run uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z (the docs recommend pinning a release tag; the bare .git form works but is unpinned).

  2. Initialize the workflow

    Run specify init in the repo. The agent gains the /speckit.* slash commands (works with 30+ CLI and IDE agents).

  3. Optionally set principles

    Run /speckit.constitution to establish governing project principles.

  4. Specify the what

    Run /speckit.specify to define requirements / user stories. Use /speckit.clarify to resolve underspecified requirements (recommended).

  5. Plan the how, then break into tasks

    Run /speckit.plan for the technical implementation plan, then /speckit.tasks to break it into actionable tasks. /speckit.analyze cross-checks artifact consistency; /speckit.checklist generates quality checklists.

  6. Implement

    Run /speckit.implement to execute the tasks. /speckit.taskstoissues converts tasks into GitHub issues. Each phase produces a durable artifact — the spec "becomes executable" and defines the what before the how.

MethodologyWhere artifacts liveThe phased flow
GitHub Spec Kitspecify CLI output + repo/speckit.specify/speckit.plan/speckit.tasks/speckit.implement
AWS Kiro.kiro/specs/<feature>/requirements.md (EARS) → design.mdtasks.md → Run all Tasks
Spec Kit and Kiro both convert a vague ask into durable, high-signal artifacts.

Keep long sessions sharp: four techniques

When a task outruns one window, four named techniques (Anthropic) prevent the slide into context rot. Compaction summarizes the conversation near the limit and reinitiates from the summary — maximize recall first, then iterate to improve precision; "one of the safest, lightest-touch forms of compaction is tool result clearing." Structured note-taking (agentic memory) writes notes to durable storage outside the window (a progress file, a to-do list) and pulls them back only when needed. Sub-agent architectures spin up specialized sub-agents with clean context windows that return only a condensed summary — typically ~1,000–2,000 tokens — to the coordinator ("since context is your fundamental constraint, subagents are one of the most powerful tools available"). Cross-session harnesses externalize state so a fresh session resumes cold.

a cross-session harness resuming cold
… scroll to run this session
A fresh session has no memory of the last one. The harness rebuilds state from durable files — git log, a progress file, and a JSON feature list — instead of from chat history.
ToolFileLoad behavior
Claude CodeCLAUDE.mdDropped in up front; tools fetch the rest just-in-time
OpenAI CodexAGENTS.mdFiles from ~/.codex + each dir from repo root to cwd; deeper overrides shallower; model "trained to closely adhere"
Gemini CLIGEMINI.mdConcatenated from ~/.gemini/GEMINI.md, ./GEMINI.md, subdir files; /memory show to inspect, /memory reload to reload
AWS Kiro.kiro/steering/*.mdproduct.md, tech.md, structure.md — "included in every interaction by default"
The persistent project-context file converges across tools — same idea, different name and load order.

Knowledge check

You are 90 minutes into a large refactor and the context window is nearly full. The agent is starting to "forget" decisions it made early in the session. What is the soundest move?

FootgunFix
"A bigger context window solves it"It does not — context rot degrades accuracy regardless of max size
Dumping the whole repo or giant pasted logs into the promptJust-in-time retrieval (paths + tools) or a curated subset
Contradictory instructions in the prompt or memory fileAudit for conflicts; the model burns tokens reconciling them
Over-stuffed tool lists with overlapping toolsConsolidate and namespace under a common prefix
Verbose tool results (raw JSON, uuids, base64)Return human-readable output; paginate or truncate
Letting chat history be the only memory in long tasksExternalize state to files (progress file, JSON task list)
Ending with only a planA plan is not a deliverable unless asked; reconcile every TODO first
The recurring footguns and the fix for each. None are tool-specific.

Reach the end and this star joins your charted sky.