The Forge · 9 min mission

Codex Best Practices: The Official Playbook, Decoded

Apply OpenAI's own best practices — context, planning, review, MCP.

best practicesworkflowcodexFact-checked 2026-06-13

On this page

Why a playbook beats a clever prompt
Move one: the four-part prompt contract
Move two: plan before code
Move three: stop repeating yourself — use AGENTS.md
Move four: add live context with MCP — sparingly
Move five: close the loop with review

OpenAI publishes a best-practices page for Codex, and most people skim it once and forget it. That is a mistake — it is the closest thing to a manual for how to get reliable work out of an agent, and it rewards being read as a system rather than a list of tips. This guide takes that official guidance and rebuilds it into a loop you can actually run on every task.

The shape is simple. You write a prompt that states what "done" means, you let Codex plan before it touches code, you push repeated knowledge out of the prompt and into reusable files, you wire in live context only where you truly need it, and you make Codex review its own work before you do. Five moves. The rest of this guide is each move in detail, with the official wording where it matters.

Why a playbook beats a clever prompt

The temptation with any coding agent is to chase the perfect one-shot prompt — a giant block of instructions you paste and pray over. The official guidance pushes the opposite instinct. As it puts it, "clear prompting isn't required to get value, but it does make results more reliable, especially in larger codebases." Reliability comes from structure that repeats: the same prompt contract, the same plan-then-build rhythm, the same reusable guidance file, the same review pass. A playbook is what turns a lucky session into a dependable one.

Move one: the four-part prompt contract

The official guidance gives prompts a spine. A strong request, it says, answers four questions, and you can treat them as a checklist before you hit enter:

Goal — what you are trying to change or build, stated as an outcome, not a vibe.
Context — the files, folders, docs, examples, or error output relevant to the task. Point Codex at the evidence instead of making it hunt.
Constraints — the standards, architecture, safety requirements, and conventions the change must respect.
Done when — the completion criteria: tests passing, a bug no longer reproducing, a specific behavior verified.

The "Done when" line is the one beginners skip and the one that changes the most. An agent with no definition of done stops when it thinks it is finished; an agent with explicit criteria keeps working — running the tests, checking the behavior — until the criteria are actually met. You are not writing a longer prompt, you are writing a prompt the agent can self-check against.

Paired with this is reasoning effort, which you tune to the task rather than maxing out by reflex. The guidance is to choose a low level for faster, well-scoped work; medium or high for complex changes and debugging; and an extra-high level for long, reasoning-heavy agentic tasks. Low is not "worse" — for a well-scoped edit it is faster and just as correct.

Vague ask vs. the four-part contract

Vague ask

"Make the checkout flow faster."

No files, no constraints, no definition of done. Codex has to guess which flow, what "faster" means, and when it is allowed to stop. You will burn a turn just negotiating scope.

Goal · Context · Constraints · Done when

"Goal: cut the checkout API's p95 latency. Context: src/checkout/handler.ts and the slow query flagged in logs/perf.txt. Constraints: keep the public response shape unchanged; no new dependencies. Done when: npm run bench:checkout shows p95 under 200ms and npm test passes."

The agent knows where to look, what it may not break, and exactly when it is finished.

Move two: plan before code

For anything complex, ambiguous, or hard to describe well, the official advice is direct: ask Codex to plan before it starts coding. Planning is where an agent surfaces the assumptions it would otherwise bake silently into an implementation — which is far cheaper to fix in a paragraph than in a diff.

The guidance offers three on-ramps, in rough order of ease. The first and recommended default is Plan Mode, described as "the easiest and most effective option" for most users: a mode where Codex gathers context, asks clarifying questions, and produces a plan before writing anything. Its no-edit behavior is a prompt-level instruction to the model — not a runtime sandbox — so for genuinely untrusted work pair it with the separate Read-only approval mode, which actually blocks writes. In the CLI you enter Plan Mode with the /plan slash command — or by toggling with Shift+Tab — and you can pass an inline prompt as the first planning request. The second approach is the interview: instead of asking for a plan, you ask Codex to question you, dragging a fuzzy idea into focus before any code exists. The third, for advanced multi-step work, is a written PLANS.md the agent fills in and works from.

The common thread is that you read the plan before you approve the build. A plan you skim and rubber-stamp is no safer than no plan at all; the value is entirely in catching the wrong assumption while it is still just words.

plan mode, then build

… scroll to run this session

In Plan Mode, Codex explores and proposes a plan and is instructed not to edit until you approve. The clarifying question is the point — it is cheaper to answer here than to unwind a wrong assumption from a diff.

Move three: stop repeating yourself — use AGENTS.md

If you find yourself pasting the same context into every prompt — the build command, the test runner, the "never touch the legacy billing module" rule — that knowledge does not belong in your prompt. It belongs in AGENTS.md, an open-format instruction file that Codex loads into context automatically at the start of every task. The official line is the whole philosophy in one sentence: "a short, accurate AGENTS.md is more useful than a long file full of vague rules."

AGENTS.md is a stack, not a single file, and Codex merges it from broad to specific. Your global personal defaults live in ~/.codex/AGENTS.md. The repository file lives at the repo root as plain AGENTS.md — note it is the project root, not inside .codex/ — and ships to your team through version control. Subdirectory AGENTS.md files add local rules for a specific package or service. Codex concatenates them from the repository root down toward your working directory, and files closer to where you are working override earlier guidance because they appear later in the combined prompt. Codex stops once the combined files reach a size cap (a 32 KiB default), which is another reason to keep each file lean.

You do not have to start from a blank file: the /init slash command scaffolds a starter AGENTS.md by inspecting the repo, and you refine from there. And when Codex repeats a mistake, the official move is a retrospective — ask it what guidance would have prevented the error, then add that line to AGENTS.md so the lesson sticks for every future run.

Scope	Location	Good for
Global	`~/.codex/AGENTS.md`	Your personal defaults across every project on the machine
Repository	`AGENTS.md` at the repo root	Build/test/lint commands, layout, conventions, PR expectations — shared via git
Subdirectory	`AGENTS.md` inside a package or service folder	Local rules that apply only to that part of the tree

The AGENTS.md stack, merged from broadest to most specific. Closer-to-your-work files win on conflicts because they land later in the combined prompt.

Move four: add live context with MCP — sparingly

Some context cannot live in a file because it changes. The status of a ticket, the current schema in a database, the latest entries in an observability tool — pasting these into a prompt means they are stale the moment you send it. This is what the Model Context Protocol (MCP) is for: it connects Codex to external tools and systems so it can pull live information itself instead of relying on what you copied in.

The official guidance is clear about when MCP earns its place: reach for it when the context lives outside the repository, when the data changes frequently, when a task is better served by Codex using a tool than by following pasted instructions, or when you need the same integration to be repeatable across people and projects. You add a server from the Codex app under Settings → MCP servers, or in the CLI with codex mcp add (giving it a name, the server URL, and connection details); Codex supports STDIO and Streamable HTTP servers, including OAuth.

But the warning that follows is the part people ignore, so it is worth quoting exactly: "Add tools only when they unlock a real workflow. Do not start by wiring in every tool you use." Every connected server is more surface area, more for the agent to consider, and more that can go wrong. The discipline is to add a tool the day a real workflow needs it — not the day you set up Codex.

Move five: close the loop with review

Generation is the middle of the job, not the end. The official guidance is to keep going past the first diff: ask Codex to create or update tests, run the relevant suites, execute lint and formatting and type checks, confirm the behavior matches the request, and review the change for bugs and risky patterns. That last item is the highest-leverage habit in the whole playbook — Codex reviewing its own work catches a surprising share of issues before a human ever looks.

The dedicated tool for this is the /review slash command. It can review uncommitted changes, review a specific commit, run a PR-style review against a base branch, or follow custom review instructions you supply. In the Codex app you can also toggle a diff panel to walk the changes yourself and click a row to leave feedback that feeds back into the next turn.

To make review consistent rather than ad hoc, the official pattern is to write a code_review.md describing what a good review looks like for your project and reference it from AGENTS.md, so review behavior stays the same across repositories. The standard worth aspiring to is the one OpenAI reports running internally: every pull request gets a Codex review before a human signs off. Review is not the step you bolt on at the end — it is part of how the work gets done.

The loop, end to end

Write the contract
State Goal, Context, Constraints, and Done when. Pick a reasoning level that matches the difficulty — low for scoped edits, higher for genuinely hard or ambiguous work.
Plan first if it is non-trivial
For anything complex or ambiguous, enter Plan Mode with /plan (or Shift+Tab). Read the plan, answer the clarifying questions, and approve before any code is written.
Lift repeated context into AGENTS.md
Anything you typed that you would type again belongs in AGENTS.md, not the next prompt. Scaffold with /init; trim ruthlessly toward short and accurate.
Add MCP only when a workflow needs it
When context is live or external, connect a server with codex mcp add or via Settings. Add tools that unlock a real workflow — not every tool you own.
Review before you do
Have Codex run tests, lint, and type checks, then /review the diff for bugs and risky patterns. Standardize it with a code_review.md referenced from AGENTS.md.

Knowledge check

You keep pasting your build command, test runner, and a "do not modify the legacy billing module" rule into every Codex prompt. According to the official best practices, what should you do?

Reach the end and this star joins your charted sky.