The Navigator · 10 min mission
Context Engineering: The Window Is Your Workbench
Control what fills the context window and keep long sessions sharp.
On this page
Every other skill in this constellation — CLAUDE.md, skills, subagents, hooks — ultimately serves one resource: the context window. It is the single most important thing you manage in a session, and the one beginners think about least.
Anthropic states the constraint plainly: most best practices exist because Claude's context window fills up fast, and performance degrades as it fills. When the window gets crowded, Claude starts "forgetting" earlier instructions and making more mistakes. Context engineering is the craft of deciding what occupies that window — and, just as importantly, what does not.
What the context window actually holds
Think of the window as a workbench, not a filing cabinet. Everything you and Claude touch in a session sits out on it at once: every message you send, every file Claude reads, every command's output, every tool result. A single debugging session or codebase exploration can generate and consume tens of thousands of tokens without you noticing.
That has a counterintuitive consequence. Asking Claude to "investigate" something vague isn't free — it reads dozens of files and every one of them lands on the bench. The window is finite, so a cluttered bench leaves less room for the thing you actually care about.
What loads before you type a word
The bench is never empty when a session starts. A baseline always loads first: the system prompt, tool definitions, and your CLAUDE.md project memory. That is why the docs push you to keep CLAUDE.md under 200 lines — those tokens are resident in every conversation, even when they are irrelevant to the task at hand. A bloated memory file does double damage: it crowds the window and, because rules get lost in the noise, Claude starts ignoring half of it.
MCP servers are the other quiet baseline cost. By default Claude Code defers MCP tool definitions — only the tool names enter context until Claude actually uses a specific tool — but connected servers still add up. Run /context to see exactly what is consuming space, and /mcp to disable servers you are not using.
Auto-compaction: the safety net
You will rarely hit a hard wall, because Claude Code compacts automatically. When the conversation approaches the context limit, auto-compaction triggers: Claude summarizes the history — preserving what matters most, including code patterns, file states, and key decisions — and frees the rest of the space. The conversation continues; the bench gets tidied without you stopping work.
Auto-compaction is a safety net, not a strategy. The summary is lossy by design: detail gets traded for room. If you wait for it to fire on its own, you don't control what survives. The better move is to manage context before the net catches you — which is what the next two commands are for.
/clear vs /compact — two different jobs
/clear — reset
Wipes the window back to its base load (system prompt, tools, CLAUDE.md). Nothing from the conversation survives.
Use it when you switch to unrelated work, or when a task has gone in circles. A clean window with a sharp prompt almost always beats a long one full of detours.
/compact — summarize
Condenses the current conversation into a summary and keeps going. The thread of work survives, just compressed.
Use it when you want to stay on the same task but the window is filling. Steer what is kept: /compact Focus on the API changes tells Claude what to preserve.
Fill the window, then reclaim it
Watch the context window
The context window holds the whole conversation — every message, every file read, every command output. Add load and watch it fill. As it nears the limit, performance starts to slip — that's when /compact and /clear earn their keep.
tokens in context · Healthy headroom
Add to the conversation
Keeping a marathon session sharp on a large codebase
Clear between unrelated tasks
The "kitchen sink" session — one task, then an unrelated question, then back to the first — is the most common failure pattern.
/clearbetween unrelated tasks keeps each one working from a clean bench.Clear after you correct twice
If you have corrected Claude more than twice on the same issue, the window is now polluted with failed approaches.
/clearand restart with a sharper prompt that folds in what you just learned. It outperforms grinding on a cluttered thread.Delegate exploration to subagents
Reading many files to "investigate" something fills your window. Instead: "use a subagent to investigate how our auth handles token refresh." The subagent reads in its own context and returns only a summary — the file reads never touch your main thread.
Offload verbose output
Test logs, build output, and fetched docs are token-heavy. Push them to subagents or filter them with a hook so only the signal — the failures, the answer — comes back. A hook that greps a 10,000-line log for
ERRORcan cut tens of thousands of tokens to hundreds.Spec, then start fresh
For a big feature, have Claude interview you and write a spec to a file. Then
/clearand execute the spec in a clean session. Implementation runs on context focused entirely on the task, with the written spec as the source of truth.
| Command / action | What it does | Reach for it when |
|---|---|---|
/context | Shows what is currently consuming the window | You want to see where tokens are going |
/usage | Shows the running token total for the session | Tracking spend or session size |
/clear | Resets the window to base load | Switching to unrelated work |
/compact <focus> | Summarizes the conversation, keeping the rest | Same task, window filling up |
/rewind / Esc Esc | Restore or summarize from a checkpoint | Undo a wrong turn or condense one end |
| Subagent | Explores in a separate context, returns a summary | Heavy reads or verbose output |
Knowledge check
You have spent 40 minutes fixing one tricky bug, the window is getting full, but you are still mid-fix and the history matters. What is the right move?
Develop the instinct
The rules here are defaults, not laws. Sometimes you should let context accumulate because you are deep in one complex problem and the whole history is load-bearing. The skill is noticing the trade. A bench with room to work beats a bench buried in tools you finished using an hour ago — but an empty bench in the middle of delicate joinery is its own mistake.
Watch what happens. When Claude produces sharp output, notice how much was on the bench. When it starts forgetting your instructions or repeating mistakes, treat that as the signal it usually is: the window is too full, and it is time to /compact, /clear, or hand the next investigation to a subagent.
What actually survives a compaction
Auto-compaction does not just shrink the conversation — it rebuilds the bench from the base load and then layers the summary on top. The summary is lossy, but the base load is reconstructed deterministically, and that is where most "why did it forget that?" surprises come from. Some context is re-injected verbatim after every compaction; some is quietly dropped until something pulls it back in.
The distinction maps onto the loading rules you already know [V]: your project-root CLAUDE.md and any unscoped rules in it are part of the base load, so they come back every time. Path-scoped rules and nested (child-directory) CLAUDE.md files are loaded on demand — Claude only pulls a child CLAUDE.md when it reads a file in that directory — so after a compaction they are gone until the next read re-triggers them. Skill bodies are re-injected only while a skill is invoked; the long listing of every skill's name and description is treated as discardable scaffolding.
| Context | After compaction | Why |
|---|---|---|
Project-root CLAUDE.md | Re-injected | Part of the base load; read at the start of every conversation |
| Unscoped CLAUDE.md rules | Re-injected | Always-on memory — resident in every turn regardless of task |
MEMORY.md (up to ~200 lines) | Re-injected | Persistent memory is reloaded; the ~200-line budget keeps it cheap enough to always carry |
| Invoked skill bodies | Re-injected (capped ~5k tokens/skill, ~25k total) | A skill in active use is reloaded; caps stop one skill — or many — from swallowing the window |
| Path-scoped rules | Lost | Loaded on demand for a matching path; not part of the base load |
Nested (child-directory) CLAUDE.md | Lost | Pulled in only when Claude reads a file in that directory — re-triggered by the next read there |
| The skill-descriptions listing | Lost | Discardable scaffolding; Claude re-discovers a skill when the task calls for it |
Ask without spending: /btw
Not every question is worth a turn. When you want a quick answer about the work in front of you but don't want it on the bench, use /btw [V]. The answer appears in a dismissible overlay and never enters the conversation history, so checking a detail costs you nothing in resident context.
Two properties make it precise. A side question has full visibility into the current conversation — it can answer about code Claude already read or a decision it made earlier — but it has no tool access: it cannot read files, run commands, or search, only reason over what is already in context [V]. That makes /btw the exact inverse of a subagent, which starts empty but has full tools. There are no follow-up turns in the overlay; a side question gets a single response. If the thread is worth continuing, press f to fork it into a new session that inherits the parent conversation plus this exchange as real transcript turns — with full tool access — while the original session is preserved under /resume [V]. You can even run /btw while Claude is mid-task; it runs independently and does not interrupt the turn.
Tune the MCP startup cost: ENABLE_TOOL_SEARCH
Tool search is what keeps a wall of MCP servers from eating your window: by default Claude Code defers MCP tool definitions, loading only tool names and server instructions at startup and discovering full schemas on demand. For most setups that is the right default and you never touch it. But it is a lever, and the ENABLE_TOOL_SEARCH environment variable lets you trade startup context against in-session search calls [V].
The middle setting is the useful one: ENABLE_TOOL_SEARCH=auto switches to threshold mode — tool schemas load upfront if they fit within 10% of the context window, and only the overflow is deferred [V]. You get instant access to a modest toolset without a search step, while a sprawling one still stays off the bench. Tune the threshold with auto:N (a percentage, e.g. auto:5 for 5%). The endpoints round it out: true forces full deferral, false loads every tool upfront with no deferral.
| Value | Startup behavior | Reach for it when |
|---|---|---|
| (unset) | All MCP tools deferred, discovered on demand (the default) | Most setups — minimal startup cost |
auto | Load upfront if tools fit within 10% of the window; defer the overflow | A modest toolset you want instantly, without a search step |
auto:N | Same threshold mode with a custom percentage (e.g. auto:5) | You want to tune exactly how much startup budget tools may claim |
true | All MCP tools deferred | Forcing deferral through a proxy or on Vertex AI |
false | All MCP tools loaded upfront, no deferral | Small, fixed toolset Claude needs on every turn |
Reach the end and this star joins your charted sky.