The Forge · 9 min mission

Prompting Codex: Patterns That Work

Frame tasks so Codex plans, implements, and verifies the way you want.

promptingpatternstemplatesFact-checked 2026-06-13
On this page

A coding agent is only as good as the brief you hand it. Codex will read your repo, write code, run commands, and keep going on almost anything you type — but the difference between a vague ask and a sharp one is the difference between three rounds of "no, not like that" and a clean diff on the first pass. This guide is about writing that sharp brief.

The good news from OpenAI's own guidance: you do not need a perfect prompt to get value. As the official best-practices doc puts it, "clear prompting isn't required to get value, but it does make results more reliable, especially in larger codebases or higher-stakes tasks." So treat what follows as a set of dials, not a gate. On a one-file change you can be terse; on a refactor that spans a service, the structure below is what keeps Codex on the rails.

The shape of a task Codex can finish

Codex is built to work the full loop on its own. The prompt that ships inside the CLI frames the model as an "autonomous senior engineer" that, "once the user gives a direction," will "proactively gather context, plan, implement, test, and refine without waiting for additional prompts at each step." That design choice has a direct consequence for how you write: your job is not to script every move. Your job is to make the destination unambiguous — what done looks like — and then let Codex find the path.

The single most common failure mode is the opposite. People write a goal so thin that "done" is undefined ("clean up the auth code"), then act surprised when the agent guesses at scope. Pin the destination and most of the guessing disappears.

The four-part brief: Goal, Context, Constraints, Done-when

OpenAI's best-practices doc recommends a four-element structure for any non-trivial task, and it maps cleanly onto how an experienced engineer would brief a teammate. Use it as a checklist, not a form — you rarely need all four spelled out, but knowing which one you skipped tells you why a result drifted.

  • Goal — what you are trying to change or build, in one plain sentence. This is the part people get right.
  • Context — which files, folders, docs, examples, or errors matter. In the Codex CLI you can @ mention specific files to pull them in directly, which is far more reliable than hoping the agent greps its way to the right module.
  • Constraints — the standards, architecture, safety requirements, and conventions Codex should respect. "Keep the public API unchanged." "No new dependencies." "Match the existing error-handling style."
  • Done when — the part most briefs are missing. What must be true before the task is complete? Tests passing, a specific behavior changed, a bug no longer reproducible. This is the line that turns "implement, then stop and ask" into "implement, verify, and report."

The "Done when" clause does double duty. It defines success, and it tells Codex to verify its own work — to run the tests or reproduce the bug rather than declaring victory after the edit. That is the verification half of plan-implement-verify, and you get it almost for free just by writing the clause.

Plan first when the task is fuzzy

Plan-then-implement is not always the right default. For a small, well-scoped change, asking for a plan first just adds a round trip. But for ambiguous, multi-step, or hard-to-describe work, planning up front is where the official guidance points you — and skipping it on complex tasks is on OpenAI's list of common mistakes.

The Codex CLI has a dedicated Plan Mode for exactly this: toggle it with the /plan command (or Shift+Tab). In Plan Mode, Codex investigates and proposes an approach without editing your files, so you can correct the direction before any code is written — the cheapest possible place to catch a misunderstanding. Once the plan looks right, you let it implement.

There is an asymmetry worth internalizing here. The interactive plan you ask for is a checkpoint for you. But you should not instruct Codex to always narrate a plan and pause inside a normal run — the model's own guidance warns that prompting for "an upfront plan, preambles, or other status updates during the rollout … can cause the model to stop abruptly before the rollout is complete." Use Plan Mode as a deliberate gate; do not bake "tell me your plan first" into every prompt. And when Codex does plan internally, it is built to follow through — its instructions say to "never end the interaction with only a plan."

Four repeatable templates

The four-part brief specializes into a handful of shapes you will reuse constantly. Treat these as starting points — copy, fill the blanks, delete what does not apply.

Implement a feature. Lead with the outcome and the anchor file, then the constraint and the runnable done-clause: "Add rate limiting to the public API. Context: @src/server.ts wires the routes; follow the middleware pattern in @src/middleware/auth.ts. Constraint: no new dependencies — use the existing Redis client. Done when requests over 100/min get a 429 and npm test rate-limit passes."

Refactor safely. The make-or-break clause for refactors is behavioral preservation: "Extract the duplicated date-parsing logic in @src/reports/ into one helper. Constraint: pure refactor — no behavior change, and the public exports stay identical. Done when the full test suite passes unchanged and the diff touches only the parsing call sites."

Debug a failure. Give Codex the evidence and ask it to reproduce before fixing, so the fix is grounded in a real failing case: "POST /checkout intermittently returns 500 under load — stack trace below. First reproduce it with a test, then fix the root cause, not the symptom. Done when the new test fails on main and passes after your fix."

Review a diff. Codex reviews as readily as it writes. The built-in /review command runs a PR-style review against a base branch, your uncommitted changes, or a specific commit, and it accepts custom instructions — so you can point it at exactly the risks you care about: "/review the staged changes for missing input validation, race conditions, and any swallowed errors."

Thin ask vs. four-part brief

Thin ask

"Refactor the reports module, it's a mess."

No anchor file, no constraint, no definition of done. Codex has to invent the scope — which functions, how far to go, whether behavior may change — and whatever it picks, half of it will be wrong for your intent. You spend the next three turns negotiating.

Four-part brief

"Extract the duplicated date-parsing in @src/reports/ into one helper. Pure refactor: no behavior change, public exports identical. Done when the suite passes unchanged and the diff only touches the call sites."

Goal, context, constraint, runnable done-clause. Codex knows the boundary and can verify itself. The loop converges in one pass.

plan, then implement, then verify
… scroll to run this session
Plan Mode catches the scope question before any file is touched; the runnable done-clause lets Codex close the loop on its own.

Durable rules go in AGENTS.md, not the prompt

There is a clean line between what belongs in a prompt and what belongs in your project's instruction file. A prompt is for this task. AGENTS.md is for every task. If you find yourself typing "use pnpm, not npm" or "we never edit the legacy/ folder" into prompt after prompt, that is a durable rule, and re-typing it is one of OpenAI's documented common mistakes: "overloading the prompt with durable rules instead of moving them into AGENTS.md or a skill."

The payoff is compounding. Move the build command, the test command, the conventions, and the do-not rules into AGENTS.md once, and every future prompt gets shorter and more reliable, because Codex already carries that context before you type. Your prompts then shrink to the part that is genuinely task-specific — which is exactly what they should be. (The AGENTS.md guide in this constellation covers what earns a place in that file.)

How prompting Codex differs from prompting Claude

The four-part brief is good practice for any agent — Claude included. But two practical differences are worth holding in mind when you move between them.

Codex leans hard toward autonomy, so resist over-scripting. OpenAI's prompting guide is explicit that the model is tuned to "default to implementing with reasonable assumptions" and not to "end your turn with clarifications unless truly blocked." Crucially, it warns that prompting the model to narrate plans and status updates mid-run can make it "stop abruptly before the rollout is complete." So the instinct to write a long, step-by-step checklist — which can help a less autonomous setup — works against you here. Give Codex the destination and the constraints; let it choose the route.

The instruction file and review tooling are named differently. Codex reads AGENTS.md (an open, cross-tool standard); Claude Code reads CLAUDE.md. Codex ships a first-class /review command for PR-style diff review against a base branch. The underlying skill — be specific, define done, push durable rules into the project file — transfers completely; only the file names, the slash commands, and Codex's stronger bias-to-action change how you phrase things in practice.

AspectCodexClaude Code
Project instruction fileAGENTS.md (open standard)CLAUDE.md
Plan-first gatePlan Mode (/plan or Shift+Tab)Plan mode (Shift+Tab)
Built-in diff review/review against a base branch/review and review tooling
Autonomy postureStrong bias to action; avoid over-scriptingDirectable; benefits from explicit constraints
Effort dialLow · Medium · High · Extra HighModel choice (Opus 4.8, Sonnet 4.6, Haiku 4.5)
The same prompting skill, two agents. The technique transfers; the surface details differ. Values verified against each tool's official docs.

Knowledge check

You ask Codex to "improve performance in the data layer." It refactors three files, but not the ones you cared about, and you are not sure it actually got faster. Which single change to your prompt would have helped most?

A blocking-condition registry: tell Codex the few times to stop

Codex's bias to action is the point — but it has hard edges. The sandbox already brokers the obvious cases: in Auto mode "Codex can read files, make edits, and run commands in the workspace" but "requires approval to edit outside the workspace or to access network," and untrusted, state-mutating commands — destructive Git operations among them — are gated regardless of mode [V]. What the sandbox does not know is your domain's irreversibility: dropping a migration, breaking a public contract, or touching billing all look like ordinary edits to a file-permission check.

So senior teams encode that judgement where it persists — in AGENTS.md, the one file Codex reads on every run [V]. The pattern [P] is a short, explicit blocking-condition registry: a closed list of irreversible actions that do earn a pause, paired with the standing instruction that everything else proceeds under the four-part brief. This inverts the usual fear about autonomous agents. You are not asking Codex to check in constantly — that would invite the abrupt-stop failure mode the prompting guide warns about. You are naming the handful of moves where "ask first" beats "ask forgiveness," and freeing it to run on the rest.

Keep the list tight and concrete. A registry of twenty soft "be carefuls" trains the model to ignore it; five sharp, recognizable triggers — each tied to an observable action, not a vibe — is a rule it can actually apply.

AGENTS.md — blocking conditions (stop and confirm before these)
markdown
## Autonomy & blocking conditions
 
Default: implement under the brief without pausing for approval.
Do NOT stop to narrate plans or ask for reassurance mid-task.
 
STOP and ask for explicit confirmation ONLY before these irreversible actions:
 
- Dropping, renaming, or rewriting a database migration, or running a
  destructive migration against any non-local database.
- Changing a public API surface: a route, exported function signature,
  CLI flag, or response schema that an external caller depends on.
- Modifying billing, payments, pricing, or quota/entitlement logic.
- Deleting user data, or any bulk/irreversible data mutation.
- Force-pushing, rewriting shared git history, or deleting a branch.
 
For everything else: proceed, verify against the Done-when clause, and report.

Two patterns that close the last gaps: empty-result recovery and the completeness contract

Two failure modes survive even a sharp four-part brief, and OpenAI's docs are silent on both — so treat these as field-tested practitioner habits [P], not official rules.

  • Empty-result recovery. When a search, grep, or test discovery turns up nothing, a thin prompt lets Codex treat "found nothing" as "nothing to do" and stop. Pre-empt it in the brief [P]: "If you find no matches, do not stop — widen the search (try synonyms, adjacent directories, and the test folder), and if it is still empty, report what you searched and why you think it is absent before ending." That single clause converts a silent dead-end into either a real result or an auditable account of the gap.
  • Completeness-contract verification. "Done" and "all of it done" are different claims. Bound the job with a countable contract [P] and make Codex prove it hit the count: "There are 7 call sites of parseDate — migrate every one; before finishing, grep for parseDate and confirm zero old call sites remain." A verifiable invariant ("zero remaining matches", "every handler has a test") turns a partial sweep into one Codex can check itself — the same close-the-loop move as a runnable Done-when, applied to coverage rather than correctness.

Reach the end and this star joins your charted sky.