The Forge · 11 min mission

Claude Code vs Codex: An Honest Comparison

Know which agent fits which job — and how to run both in one repo.

comparisonclaudecodexFact-checked 2026-06-13
On this page

Pick a forum thread about agentic coding tools and you will find someone insisting Claude Code is obviously better, someone equally sure Codex is, and almost nobody explaining why either claim might be true. This guide is the version that does not pick a jersey. Both are excellent. They share the same core idea — describe an outcome, the agent reads your repo, edits files, runs commands, and iterates — and they diverge in ways that genuinely matter once you live in one daily.

The honest summary up front: there is no universal winner. There is the tool that fits your model preference, your billing situation, and the surfaces your team already works in. What follows is the comparison I wish existed when I was choosing — concrete, specific, and fair to both.

The shared shape

Before the differences, respect the similarity, because it is the most important fact here. Both tools are agents, not autocomplete. Both run the same read-act-observe loop: gather context, take an action, observe the result, repeat until done. Both install as a terminal CLI, ship an IDE extension, run in the cloud for long jobs, and read a plain-markdown instructions file from your repo. If you are fluent in one, you are most of the way to fluent in the other. The skills you build — writing tight instructions, scoping tasks, reviewing diffs — transfer almost entirely. Switching tools is closer to switching editors than switching languages.

Instruction files: CLAUDE.md vs AGENTS.md

Both tools solve the same problem — an agent starts every session knowing nothing about your repo — with the same mechanism: a plain-markdown file you commit to the repo that the agent reads before you type a word. Claude Code reads CLAUDE.md; Codex reads AGENTS.md. The content advice is identical for both: list the build/test/lint commands, the non-obvious setup, the hard constraints, and what "done" means — and cut anything the agent could infer by reading the code.

The meaningful difference is ownership of the format. CLAUDE.md is Anthropic's file for Anthropic's tool. AGENTS.md is an open standard — stewarded under the Linux Foundation and read by Codex plus a long list of other agents (Jules, Cursor, Aider, GitHub Copilot, and more), used across tens of thousands of open-source projects. So if your team runs a mix of tools, one AGENTS.md guides all of them, while CLAUDE.md guides Claude Code specifically.

Both also support nested files: a per-directory instructions file deeper in the tree that applies when the agent works there. The merge rule is the same intuition in both — the more specific, more local file wins. Codex documents this precisely: it builds an instruction chain from the git root downward, concatenating files so the closest one lands last and takes precedence, and it also reads a global file at ~/.codex/AGENTS.md for machine-wide defaults. Claude Code walks the directory tree the same way and adds a user-level ~/.claude/CLAUDE.md.

DimensionClaude CodeCodex
File nameCLAUDE.mdAGENTS.md
FormatAnthropic convention, Claude Code-specificOpen standard, read by many agents
Global / user-level file~/.claude/CLAUDE.md~/.codex/AGENTS.md
Nested filesWalks the directory tree; most-specific winsChain from git root down; closest file wins
Size disciplineTarget under ~200 lines; bloat reduces adherenceCapped by project_doc_max_bytes (32 KiB default)
The instruction-file split. Both are plain markdown read at the start of a task; the differences are ownership, scope, and a few file-discovery specifics drawn from each tool’s official docs.

Extensibility: where the two philosophies show

This is where the tools stop rhyming and start differing in flavor. Claude Code exposes a broad, named toolkit of extension points, each with its own file convention. Skills are reusable procedures: a SKILL.md file under .claude/skills/ whose body loads only when invoked (via /skill-name) or when Claude decides it is relevant — they follow the open Agent Skills standard. Hooks are deterministic shell commands, HTTP calls, or prompts wired to lifecycle events like PreToolUse, PostToolUse, and Stop, configured in settings.json; they enforce behavior the way instructions cannot — block a write, run lint before a commit, refuse a dangerous command. Subagents are specialized workers defined as markdown-with-frontmatter under .claude/agents/, each running in its own context window with its own tool access and even its own model, so you can route cheap tasks to Haiku and keep your main context clean.

Codex covers much of the same ground, and the surface has grown — the "Codex is the leaner one" line you may have read in older comparisons is no longer honest. It speaks MCP (the Model Context Protocol) for connecting external tools and data, exactly as Claude Code does — that is the one extension mechanism both share natively and identically. [V] For automation, Codex leans on codex exec, a non-interactive mode that runs a single task without the conversational loop, which fills the niche Claude Code's hooks and headless -p mode occupy. [V] But Codex now also ships its own subagents (a primary agent spawns child agents, bounded by max_threads and max_depth under [agents] in config), an official SDK for TypeScript (@openai/codex-sdk) and Python (openai-codex) so you can drive the agent from your own process, hosted cloud tasks that run in isolated containers and return a pull request, and an official GitHub Action (openai/codex-action@v1) for CI. [V] The practical read in 2026: both tools now field a broad extension surface; they differ in shape — Claude Code names more distinct primitives (skills, hooks, subagents) as repo files, while Codex concentrates its surface around codex exec, the SDK, and the cloud. Neither is the "small core" anymore.

Extension pointClaude CodeCodex
Reusable proceduresSkills (SKILL.md under .claude/skills/)Prompt files + codex exec recipes
Lifecycle enforcementHooks at PreToolUse / PostToolUse / Stopcodex exec in a wrapper script or the GitHub Action
Parallel subagentsSubagents under .claude/agents/, own context + modelSubagents via [agents] (max_threads 6, max_depth 1)
Programmatic SDKHeadless -p / TS Agent SDK@openai/codex-sdk (TS) + openai-codex (Python)
Cloud / background tasksClaude Code on the web, long-running jobsCloud tasks in isolated containers → a PR
CI integrationGitHub Action + headless CLIOfficial openai/codex-action@v1 GitHub Action
External tools / dataMCP (native)MCP (native, identical protocol)
Extension surface, honest as of 2026. Once a real gap, the Codex column has filled in — subagents, an official SDK, cloud tasks, and a GitHub Action all now ship. The split is shape, not size: named repo-file primitives on the left, a CLI/SDK/cloud spine on the right.

Claude Code vs Codex, feature by feature

Claude Code vs Codex

The same agentic-coding ideas, two houses. Each column carries its own accent — coral for Claude, mint for Codex. Tap a header to isolate a tool, or replay the reveal.

Instruction fileThe repo file each tool reads to learn your project.

CLAUDE.md at the repo root (plus ~/.claude/CLAUDE.md and auto memory).

AGENTS.md in the repo, with config.toml for settings.

ExtensibilityHow you package and share repeatable workflows.

Plugins, skills and marketplaces — sharable across a team.

Custom prompts and config profiles.

SubagentsDelegating side-work to isolated helper agents.

Markdown agents in .claude/agents/ — each its own context, tools and model.

Supported for delegating focused subtasks.

HooksRun your own commands around the agent’s actions.

Shell hooks on lifecycle events, wired up in settings.json.

Hooks for workflow automation.

MCPConnecting external tools via Model Context Protocol.

MCP servers; tool definitions deferred until first used.

MCP servers for third-party tools and context.

Autonomy surfacesWhere each tool can run.

Terminal, VS Code + JetBrains, Desktop, Web, GitHub Actions, Agent SDK.

CLI, IDE extension, Cloud/Web, GitHub Action, Codex SDK, Chrome.

ModelsThe model families each tool drives.

Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5.

GPT-5.5, GPT-5.4, GPT-5.4 mini.

Pricing modelHow you pay to use it.

Claude Pro / Max / Team / Enterprise, or Anthropic API tokens.

ChatGPT Plus / Pro / Business / Edu / Enterprise, or API tokens.

Best forWhere each one tends to shine.

Deep, multi-surface workflows tuned through CLAUDE.md, skills and hooks.

Teams already living in ChatGPT and the OpenAI stack.

Showing both

Verified against code.claude.com/docs and developers.openai.com/codex — current as of 2026. Both tools ship fast; check the docs for the latest.

Toggle through the dimensions that actually differ — instruction file, extension primitives, autonomy controls, models, and billing. Use it to find the one or two rows that matter for your situation, not to crown a winner.

Autonomy surfaces: how each tool lets you off the leash

Both tools let you dial how much the agent can do without asking — and both default to a careful, ask-first posture. The difference is the shape of the controls.

Codex exposes two explicit, orthogonal knobs in ~/.codex/config.toml. sandbox_mode decides what the agent is capable of touching (read-only, workspace-write, or danger-full-access), and approval_policy decides when it must stop and ask (untrusted, on-request, or never). The sane local default is workspace-write plus on-request: edit inside the repo freely, pause before anything riskier. It is a clean mental model — capability and permission are separate dials you set independently.

Claude Code expresses autonomy through permission rules (allow/ask/deny lists for tools and commands) layered with hooks that can hard-block actions programmatically, plus interactive permission prompts and a plan-first mode. The philosophy is the same — least privilege by default, widen deliberately — but the surface is rules-and-hooks rather than two enum keys. If you like a single config file with two obvious dials, Codex's model is satisfying. If you want fine-grained, per-command rules and the ability to enforce policy with a script, Claude Code's model gives you more reach.

Autonomy controls, side by side

Codex

Two orthogonal enum keys in ~/.codex/config.toml:

sandbox_mode = read-only | workspace-write | danger-full-accesswhat it can touch.

approval_policy = untrusted | on-request | neverwhen it must ask.

Sane local default: workspace-write + on-request. Clean, two-dial mental model.

Claude Code

Permission rules (allow / ask / deny lists for tools and commands) in settings.json, layered with hooks that can programmatically block an action at PreToolUse.

Plus interactive prompts and a plan-first mode.

More moving parts, but finer-grained — you can deny one exact command or enforce policy with a script.

The models, and why they are the real choice

Strip away the tooling and the decision often comes down to: which model do you want writing your code? Claude Code runs Anthropic's lineup — Fable 5 for the hardest work, Opus 4.8 and Sonnet 4.6 as strong all-rounders, Haiku 4.5 for fast, cheap tasks. Codex runs OpenAI's — GPT-5.5 as the recommended default for complex coding, with gpt-5.4-mini as the faster, cheaper option for lighter tasks or subagents. Both tools let you switch models mid-session (/model) and pin a default in config.

I will not tell you one model family is universally better, because that is genuinely workload-dependent and it changes with every release. The honest advice is empirical: if you have a paid plan for both, run the same real task through each — a bug you understand, a refactor you can verify — and judge the diffs, not the marketing. Most teams find each model has a personality, and the right question is which personality fits your codebase and your reviewing style, not which benchmark leads this month.

Pricing philosophy

The two tools share a billing shape but differ in the details. Both bundle the agent into the company's paid subscription, and both offer an API-key path that bills per token for programmatic and CI use. Codex is included across ChatGPT plans (Plus, Pro, Business, Edu, Enterprise, and lighter tiers), metered with rolling usage windows; Claude Code runs on a Claude subscription or an Anthropic API key. The decision rule is refreshingly simple: if your team already pays for ChatGPT, Codex adds no new bill; if you already pay for Claude, Claude Code adds none. For automated, headless, or high-volume use, both point you to usage-based API-key billing rather than a flat seat.

You do not actually have to choose

The framing as a war is mostly marketing. The two files — CLAUDE.md and AGENTS.md — do not conflict, and nothing stops a repo from carrying both. Plenty of teams keep both agents installed and reach for whichever model suits the task, or run one as a second opinion on the other's diff. Here is the concrete setup.

Run both in one repo

  1. Write AGENTS.md as the shared source of truth

    Put the build/test/lint commands, setup gotchas, and hard constraints in AGENTS.md at the repo root. Because it is the open standard, Codex and many other agents read it directly — so it is the broadest-reach file to maintain.

  2. Point CLAUDE.md at it

    Keep a short CLAUDE.md that imports the shared file with @AGENTS.md so Claude Code reads the same instructions, then add only the few Claude-specific notes on top (a skill to prefer, a hook to respect). One source of truth, no drift between the two files.

  3. Set each tool’s autonomy to the same posture

    Give Codex sandbox_mode = "workspace-write" and approval_policy = "on-request" in ~/.codex/config.toml; give Claude Code an equivalent ask-before-risky permission setup. Matching postures means the agents behave consistently no matter which one you grab.

  4. Use them for what each does best

    Run the model you trust for the task at hand, and use the other as a reviewer — have one implement and the other critique the diff. Two independent agents catch different mistakes; the second opinion is the whole point of keeping both.

one repo, both agents
… scroll to run this session
A CLAUDE.md that imports the shared AGENTS.md keeps a single source of truth. Codex reads AGENTS.md natively; Claude Code reads it through the import.

They call each other now

The "second opinion" pattern stopped being a manual copy-paste chore — each agent can now invoke the other as a subagent and hand back the result. [P] Claude Code can delegate a task to Codex and read its diff; Codex can do the reverse. That turns "run both" from a discipline you maintain into a single command. The two tandem guides walk through the exact wiring in both directions: calling Codex from Claude Code and calling Claude Code from Codex. The setup is symmetric — pick whichever agent you live in as the driver, and reach for the other when you want an independent pass on the same problem.

Knowledge check

You want one set of project instructions that both Codex and Claude Code follow, with no second copy to keep in sync. What is the cleanest setup?

Reach the end and this star joins your charted sky.