The Navigator · 10 min mission

Context Engineering: The Window Is Your Workbench

Control what fills the context window and keep long sessions sharp.

contextperformancesessionsFact-checked 2026-06-13
On this page

Every other skill in this constellation — CLAUDE.md, skills, subagents, hooks — ultimately serves one resource: the context window. It is the single most important thing you manage in a session, and the one beginners think about least.

Anthropic states the constraint plainly: most best practices exist because Claude's context window fills up fast, and performance degrades as it fills. When the window gets crowded, Claude starts "forgetting" earlier instructions and making more mistakes. Context engineering is the craft of deciding what occupies that window — and, just as importantly, what does not.

What the context window actually holds

Think of the window as a workbench, not a filing cabinet. Everything you and Claude touch in a session sits out on it at once: every message you send, every file Claude reads, every command's output, every tool result. A single debugging session or codebase exploration can generate and consume tens of thousands of tokens without you noticing.

That has a counterintuitive consequence. Asking Claude to "investigate" something vague isn't free — it reads dozens of files and every one of them lands on the bench. The window is finite, so a cluttered bench leaves less room for the thing you actually care about.

What loads before you type a word

The bench is never empty when a session starts. A baseline always loads first: the system prompt, tool definitions, and your CLAUDE.md project memory. That is why the docs push you to keep CLAUDE.md under 200 lines — those tokens are resident in every conversation, even when they are irrelevant to the task at hand. A bloated memory file does double damage: it crowds the window and, because rules get lost in the noise, Claude starts ignoring half of it.

MCP servers are the other quiet baseline cost. By default Claude Code defers MCP tool definitions — only the tool names enter context until Claude actually uses a specific tool — but connected servers still add up. Run /context to see exactly what is consuming space, and /mcp to disable servers you are not using.

inspecting what is loaded
… scroll to run this session
Two commands make the invisible visible: /context shows the breakdown, /usage shows the running token total for the session.

Auto-compaction: the safety net

You will rarely hit a hard wall, because Claude Code compacts automatically. When the conversation approaches the context limit, auto-compaction triggers: Claude summarizes the history — preserving what matters most, including code patterns, file states, and key decisions — and frees the rest of the space. The conversation continues; the bench gets tidied without you stopping work.

Auto-compaction is a safety net, not a strategy. The summary is lossy by design: detail gets traded for room. If you wait for it to fire on its own, you don't control what survives. The better move is to manage context before the net catches you — which is what the next two commands are for.

/clear vs /compact — two different jobs

/clear — reset

Wipes the window back to its base load (system prompt, tools, CLAUDE.md). Nothing from the conversation survives.

Use it when you switch to unrelated work, or when a task has gone in circles. A clean window with a sharp prompt almost always beats a long one full of detours.

/compact — summarize

Condenses the current conversation into a summary and keeps going. The thread of work survives, just compressed.

Use it when you want to stay on the same task but the window is filling. Steer what is kept: /compact Focus on the API changes tells Claude what to preserve.

Fill the window, then reclaim it

Watch the context window

The context window holds the whole conversation — every message, every file read, every command output. Add load and watch it fill. As it nears the limit, performance starts to slip — that's when /compact and /clear earn their keep.

12k/ 200k

tokens in context · Healthy headroom

Summary + base (12k)Live context (0k)

Add to the conversation

Add load and watch the bar move through healthy, warm, and hot. Then feel the difference: /compact leaves a condensed summary segment behind, while /clear resets to the base load.

Keeping a marathon session sharp on a large codebase

  1. Clear between unrelated tasks

    The "kitchen sink" session — one task, then an unrelated question, then back to the first — is the most common failure pattern. /clear between unrelated tasks keeps each one working from a clean bench.

  2. Clear after you correct twice

    If you have corrected Claude more than twice on the same issue, the window is now polluted with failed approaches. /clear and restart with a sharper prompt that folds in what you just learned. It outperforms grinding on a cluttered thread.

  3. Delegate exploration to subagents

    Reading many files to "investigate" something fills your window. Instead: "use a subagent to investigate how our auth handles token refresh." The subagent reads in its own context and returns only a summary — the file reads never touch your main thread.

  4. Offload verbose output

    Test logs, build output, and fetched docs are token-heavy. Push them to subagents or filter them with a hook so only the signal — the failures, the answer — comes back. A hook that greps a 10,000-line log for ERROR can cut tens of thousands of tokens to hundreds.

  5. Spec, then start fresh

    For a big feature, have Claude interview you and write a spec to a file. Then /clear and execute the spec in a clean session. Implementation runs on context focused entirely on the task, with the written spec as the source of truth.

Command / actionWhat it doesReach for it when
/contextShows what is currently consuming the windowYou want to see where tokens are going
/usageShows the running token total for the sessionTracking spend or session size
/clearResets the window to base loadSwitching to unrelated work
/compact <focus>Summarizes the conversation, keeping the restSame task, window filling up
/rewind / Esc EscRestore or summarize from a checkpointUndo a wrong turn or condense one end
SubagentExplores in a separate context, returns a summaryHeavy reads or verbose output
The context-management toolkit at a glance.

Knowledge check

You have spent 40 minutes fixing one tricky bug, the window is getting full, but you are still mid-fix and the history matters. What is the right move?

Develop the instinct

The rules here are defaults, not laws. Sometimes you should let context accumulate because you are deep in one complex problem and the whole history is load-bearing. The skill is noticing the trade. A bench with room to work beats a bench buried in tools you finished using an hour ago — but an empty bench in the middle of delicate joinery is its own mistake.

Watch what happens. When Claude produces sharp output, notice how much was on the bench. When it starts forgetting your instructions or repeating mistakes, treat that as the signal it usually is: the window is too full, and it is time to /compact, /clear, or hand the next investigation to a subagent.

What actually survives a compaction

Auto-compaction does not just shrink the conversation — it rebuilds the bench from the base load and then layers the summary on top. The summary is lossy, but the base load is reconstructed deterministically, and that is where most "why did it forget that?" surprises come from. Some context is re-injected verbatim after every compaction; some is quietly dropped until something pulls it back in.

The distinction maps onto the loading rules you already know [V]: your project-root CLAUDE.md and any unscoped rules in it are part of the base load, so they come back every time. Path-scoped rules and nested (child-directory) CLAUDE.md files are loaded on demand — Claude only pulls a child CLAUDE.md when it reads a file in that directory — so after a compaction they are gone until the next read re-triggers them. Skill bodies are re-injected only while a skill is invoked; the long listing of every skill's name and description is treated as discardable scaffolding.

ContextAfter compactionWhy
Project-root CLAUDE.mdRe-injectedPart of the base load; read at the start of every conversation
Unscoped CLAUDE.md rulesRe-injectedAlways-on memory — resident in every turn regardless of task
MEMORY.md (up to ~200 lines)Re-injectedPersistent memory is reloaded; the ~200-line budget keeps it cheap enough to always carry
Invoked skill bodiesRe-injected (capped ~5k tokens/skill, ~25k total)A skill in active use is reloaded; caps stop one skill — or many — from swallowing the window
Path-scoped rulesLostLoaded on demand for a matching path; not part of the base load
Nested (child-directory) CLAUDE.mdLostPulled in only when Claude reads a file in that directory — re-triggered by the next read there
The skill-descriptions listingLostDiscardable scaffolding; Claude re-discovers a skill when the task calls for it
What the base load rebuilds after an auto-compaction, and what falls off until something re-triggers it. [V]

Ask without spending: /btw

Not every question is worth a turn. When you want a quick answer about the work in front of you but don't want it on the bench, use /btw [V]. The answer appears in a dismissible overlay and never enters the conversation history, so checking a detail costs you nothing in resident context.

Two properties make it precise. A side question has full visibility into the current conversation — it can answer about code Claude already read or a decision it made earlier — but it has no tool access: it cannot read files, run commands, or search, only reason over what is already in context [V]. That makes /btw the exact inverse of a subagent, which starts empty but has full tools. There are no follow-up turns in the overlay; a side question gets a single response. If the thread is worth continuing, press f to fork it into a new session that inherits the parent conversation plus this exchange as real transcript turns — with full tool access — while the original session is preserved under /resume [V]. You can even run /btw while Claude is mid-task; it runs independently and does not interrupt the turn.

a side question that leaves no trace
… scroll to run this session
The question and answer live in an overlay only. Dismiss with Space, or press f to promote the thread into its own session.

Tune the MCP startup cost: ENABLE_TOOL_SEARCH

Tool search is what keeps a wall of MCP servers from eating your window: by default Claude Code defers MCP tool definitions, loading only tool names and server instructions at startup and discovering full schemas on demand. For most setups that is the right default and you never touch it. But it is a lever, and the ENABLE_TOOL_SEARCH environment variable lets you trade startup context against in-session search calls [V].

The middle setting is the useful one: ENABLE_TOOL_SEARCH=auto switches to threshold mode — tool schemas load upfront if they fit within 10% of the context window, and only the overflow is deferred [V]. You get instant access to a modest toolset without a search step, while a sprawling one still stays off the bench. Tune the threshold with auto:N (a percentage, e.g. auto:5 for 5%). The endpoints round it out: true forces full deferral, false loads every tool upfront with no deferral.

ValueStartup behaviorReach for it when
(unset)All MCP tools deferred, discovered on demand (the default)Most setups — minimal startup cost
autoLoad upfront if tools fit within 10% of the window; defer the overflowA modest toolset you want instantly, without a search step
auto:NSame threshold mode with a custom percentage (e.g. auto:5)You want to tune exactly how much startup budget tools may claim
trueAll MCP tools deferredForcing deferral through a proxy or on Vertex AI
falseAll MCP tools loaded upfront, no deferralSmall, fixed toolset Claude needs on every turn
ENABLE_TOOL_SEARCH — choosing where MCP tools live at startup. [V]

Reach the end and this star joins your charted sky.