The Forge · 11 min mission

Codex Cloud, GitHub & CI Automation

Delegate to isolated cloud containers, review the diffs, and wire Codex into GitHub and CI.

cloudciautomationFact-checked 2026-06-13

On this page

What Codex Cloud actually is
The environment: setup vs. maintenance scripts
Internet access: off by default, proxied, allowlisted
@codex on GitHub
The GitHub Action: openai/codex-action@v1
The senior move: a nightly bot that fixes failing tests while you sleep
The terminal bridge: codex cloud and codex apply
Putting it together

Your laptop runs one Codex at a time and you babysit it. Codex Cloud flips that: you describe five tasks, hit go, and each runs in its own isolated container in OpenAI's cloud while you keep working. You come back to five diffs, review them like pull requests, and merge the ones that are right. The agent that fixes your flaky test suite can do it at 3am, in CI, on a branch you have never checked out — and open the PR before you wake up.

This guide is the operations manual for that. Where Codex runs in the cloud (chatgpt.com/codex), how to configure the container so your build actually works, the single most-misunderstood security behavior (secrets vanish before the agent ever runs), how @codex works on GitHub, the openai/codex-action@v1 GitHub Action, a real CI-autofix-on-failure workflow, and the codex cloud / codex apply bridge that pulls a cloud task's diff onto your machine.

What Codex Cloud actually is

Codex Cloud is a hosted environment where Codex works on tasks in the background, including in parallel, each in an isolated container. [V] You start tasks three ways: from the web app at chatgpt.com/codex, by mentioning @codex on a GitHub issue or PR, or from your terminal/IDE by delegating a cloud task. [V] However a task starts, the output is the same shape — Codex makes the changes in its container and turns the result into a pull request you review before anything merges. [V]

The mental model that matters: a cloud task is not your shell. It is a fresh, sandboxed checkout of your repo with no access to your local secrets, your SSH agent, or your network — only what you explicitly grant through the environment config. Get the environment right and tasks "just work." Get it wrong and every task fails at npm install for the same reason. The rest of this guide is mostly about getting it right.

Codex Cloud is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans, and requires connecting your GitHub account so it can read repos and open PRs. [V]

The environment: setup vs. maintenance scripts

Every cloud task boots a container. The default image is universal, built from the open-source openai/codex-universal repo, and you can pin versions of Python, Node.js, and other runtimes in the environment settings. [V] But a generic image does not know your repo needs pnpm, a specific Postgres, or uv sync. That is what scripts are for, and there are two distinct kinds.

A setup script runs when a container is first created — it installs dependencies and tools, runs your npm ci or poetry install, and bakes the result into a cached image. [V] A maintenance script is optional and runs only when a cached container is resumed, to update dependencies if the repo changed since the cache was created (think git pull drift — a new lockfile, a new migration). [V]

Why two phases? Caching. Container state is cached for up to 12 hours, and the cache automatically invalidates if setup scripts, maintenance scripts, environment variables, or secrets change. You can also manually Reset cache when repo changes make the cached state incompatible. [V] The split lets the expensive cold-start work (compiling, downloading) happen once in setup, while the cheap "catch up to latest" work happens in maintenance on every resume.

The one thing that trips everyone up is not the scripts. It is what is available to them.

Danger zone:The two-phase secrets model — read this twice

This is the single most-misunderstood behavior in Codex Cloud. Environment variables are set for the full duration of the task — both the setup scripts and the agent phase. Secrets are similar, except they are only available to setup scripts: for security reasons, secrets are removed before the agent phase starts. [V]

Concretely: a NPM_TOKEN stored as a secret works during npm ci in setup, then disappears — the agent cannot read it, cannot leak it into a log, cannot exfiltrate it through a prompt-injection in some file it reads. If your agent-phase code (a test, a script the agent runs) needs a value at runtime, that value must be an environment variable, not a secret — accepting that the agent can then see it. Mis-file an API key as an env var when it should be a secret and you have widened your blast radius; mis-file a build-time token as a secret when a test needs it and your tests fail with a confusing undefined.

	Available in setup scripts	Available in agent phase	Use it for
Environment variable	Yes	Yes — agent can read it	Values a test or build needs at runtime, non-sensitive config
Secret	Yes	No — stripped before agent starts	Registry tokens, deploy keys, anything only `npm ci` / setup needs

Environment variables vs. secrets in a Codex Cloud environment. The phase column is the whole point: secrets are stripped before the agent runs.

Internet access: off by default, proxied, allowlisted

Network access is phased the same way as secrets, and the default is the safe one.

During the setup script phase, internet access is available so you can install dependencies. [V] During the agent phase, internet access is off by default — but you can configure limited or unrestricted access. [V] When you do grant it, all traffic runs behind an HTTP/HTTPS network proxy for security and abuse prevention. [V]

"Limited" is the setting you actually want in most repos: a domain allowlist so the agent can reach, say, your private package registry and nothing else. [P] That keeps a prompt-injected agent from POSTing your code to an arbitrary host while still letting a legitimate pip install from your internal index succeed. The honest tradeoff: if your tests hit live third-party APIs, you will either widen the allowlist or mock them — and widening the allowlist is a real attack surface, so widen deliberately, not "to make it work."

Standing up a working cloud environment

Connect the repo and pick the image
At chatgpt.com/codex, connect your GitHub account and select the repository. The default container is universal; pin your Python / Node versions in the environment settings so the cloud matches your local toolchain. [V]
Write the setup script
Put the cold-start install here — npm ci, uv sync, apt-get install of system libs. Internet is available in this phase, and secrets are available here, so this is the only place a registry token works. [V]
File secrets vs. env vars correctly
Registry tokens, deploy keys → secret (setup-only, stripped before the agent). Values a test reads at runtime → environment variable (whole task, agent can see it). This decision is the difference between "tasks just work" and a day of confusing failures. [V]
Decide agent-phase internet
Leave it off unless a task genuinely needs the network. If it does, prefer a domain allowlist over unrestricted access — all of it is force-proxied anyway. [V][P]
(Optional) Add a maintenance script
If cached containers go stale between runs, add a maintenance script to re-sync dependencies on resume — it runs only when a cached container is reused, not on cold start. [V]

@codex on GitHub

Once a repo is set up for Codex Cloud, you drive it from GitHub by mentioning @codex in a comment. There are exactly two behaviors, and the keyword is the switch. [V]

Comment @codex review on a pull request and Codex reviews the PR diff, follows your repository guidance, and posts a standard GitHub code review focused on serious issues. [V] Mention @codex with anything other than review and Codex starts a cloud task using your pull request as context — "@codex make the error messages user-facing" turns into a container, a diff, and a PR update. [V]

Two refinements worth knowing. Enable Automatic reviews in settings and Codex posts a review whenever someone opens a new PR, without needing an @codex review comment. [V] And you can steer a one-off review inline: @codex review for security regressions focuses that single pass. [V] Codex also reads an AGENTS.md with a "Review guidelines" section, so your house rules ("flag any new any type", "require tests for new endpoints") shape every review without being re-typed. [V]

@codex on a pull request

… scroll to run this session

Two keywords, two behaviors. "review" posts a GitHub code review of the diff; anything else spins up a cloud task that updates the PR.

The GitHub Action: openai/codex-action@v1

@codex is great for human-in-the-loop moments. For automation — run on every PR, on a schedule, on a CI failure — you want the openai/codex-action@v1 GitHub Action, which runs Codex inside your own GitHub Actions runner using codex exec under the hood. [V]

The non-negotiables: actions/checkout@v5 must run before the Codex step so the repo contents exist on disk, and you authenticate with an openai-api-key stored as a GitHub secret. [V] You give it a prompt (inline) or prompt-file (a path committed in the repo), and it exposes a final-message output you can post back as a comment. [V]

Two safety dials, both defaulting to the conservative choice:

sandbox — read-only, workspace-write (default), or danger-full-access. This is the same --sandbox mode the CLI uses: workspace-write lets Codex edit files in the workspace but not roam the machine. [V]
safety-strategy — drop-sudo (default), unprivileged-user, read-only, or unsafe. drop-sudo removes sudo irreversibly before the agent runs. [V] (Windows runners require unsafe. [V])

And permissions: the action needs only contents: read to read the repo, plus issues: write and pull-requests: write only if it posts comments or reviews. [V] By default only collaborators with write access can trigger it; tighten further with allow-users. [V]

.github/workflows/codex-review.yml — review every opened PR

name: Codex PR review
on:
  pull_request:
    types: [opened]
 
jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write   # only because we post the result back
    steps:
      # checkout MUST come first — the action needs repo contents on disk
      - uses: actions/checkout@v5
 
      - name: Run Codex
        id: codex
        uses: openai/codex-action@v1
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          sandbox: read-only            # a reviewer shouldn't write files
          safety-strategy: drop-sudo    # default; irreversibly drops sudo
          prompt: |
            Review the diff on this pull request. Focus on correctness,
            security, and missing tests. Be concise; only flag real issues.
 
      - name: Post the review
        if: always()
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `${{ steps.codex.outputs.final-message }}`
            })

The senior move: a nightly bot that fixes failing tests while you sleep

Here is the scenario that sells the whole stack. Your test suite goes red on main overnight — a flaky time-zone test, a dependency bump that broke a snapshot. Instead of you debugging it bleary-eyed at 9am, a bot triages it at 3am and leaves a green PR waiting.

The trigger is the key. You do not run the fixer on every push — you run it when CI fails, using GitHub's workflow_run event, which fires after another workflow completes and lets you gate on its conclusion. [V][P] The fixer checks out the repo, runs the Action with sandbox: workspace-write so codex exec can actually edit code, and then opens a pull request with the fix — which you review in the morning like any other PR. The agent never merges; it only proposes. [P]

.github/workflows/codex-autofix.yml — fix the suite when CI fails

name: Codex autofix on CI failure
on:
  workflow_run:
    workflows: ["CI"]        # the name: of your test workflow
    types: [completed]
 
jobs:
  autofix:
    # only act when the upstream CI run actually FAILED
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    runs-on: ubuntu-latest
    permissions:
      contents: write          # needs to push a branch
      pull-requests: write     # needs to open the PR
    steps:
      - uses: actions/checkout@v5
        with:
          ref: ${{ github.event.workflow_run.head_branch }}
 
      - name: Let Codex fix the failing tests
        uses: openai/codex-action@v1
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          sandbox: workspace-write   # a fixer must be able to edit files
          prompt: |
            The test suite is failing on this branch. Run the tests,
            find the root cause, and make the minimal change that makes
            them pass. Do not weaken or delete tests to go green.
 
      - name: Open a PR with the fix
        uses: peter-evans/create-pull-request@v6
        with:
          branch: codex/autofix-${{ github.run_id }}
          title: "Codex: autofix failing tests"
          body: "Automated fix from the nightly Codex bot. Review before merge."

The terminal bridge: codex cloud and codex apply

You do not have to live in the browser to use Codex Cloud. The CLI can launch a cloud task, choose its environment, and pull the resulting diff onto your machine without opening the interactive TUI. [V]

codex cloud exec --env <environment> "<task>" submits a task to a named cloud environment directly from your shell — the --env flag selects which environment config (image, scripts, secrets, network policy) the container boots with. [V]
codex cloud list returns recent cloud tasks as scriptable output, so you can poll or pipe task state into your own tooling. [V]
codex apply (alias codex a) applies the latest diff generated by a Codex Cloud task to your local working tree — so you can review, run, and tweak the cloud's work locally before committing. [V]

This closes the loop: kick off heavy or parallel work in the cloud where it is isolated, then codex apply to land it in front of you for a real local test run. (codex cloud is marked experimental in the CLI; codex apply is stable. [V])

fire-and-forget from the terminal, then pull it home

… scroll to run this session

Launch a task into a named cloud environment, list recent tasks, then apply the latest cloud diff to your local tree for a real test run.

When to delegate to the cloud vs. run the local CLI

Codex Cloud

Parallel, fire-and-forget — many tasks at once, each isolated. [V]
CI and automation — the GitHub Action, the nightly autofix bot.
No local secrets exposed — agent phase has no secrets and no internet by default. [V]
Output is a PR you review before merge. [V]
Slower to start (container boot, setup script).

Local `codex` CLI

Interactive, in-the-loop — you watch and steer each step.
Full local context — your real files, env, network, and credentials.
Instant — no container boot.
You own the blast radius — it edits your working tree with your access.
Use codex apply to land a cloud task here for a local test run. [V]

Knowledge check

Your cloud environment installs private packages with a registry token during setup, and a test in the agent phase reads `process.env.FEATURE_FLAG`. Where do the registry token and the feature flag belong?

Putting it together

The shape to aim for: a clean environment with secrets and env vars filed correctly and agent-phase internet locked to an allowlist; @codex review and Automatic reviews for human-in-the-loop moments; the openai/codex-action@v1 Action with the least sandbox each job needs for repeatable automation; a workflow_run autofix bot that proposes PRs but never merges them; and the codex cloud / codex apply bridge so the cloud's isolated, parallel work can always land back on your machine for a real test.

The throughline is the same one that makes the whole thing safe: the agent never has more reach than you grant it, and a human merges the diff. Secrets disappear before the agent runs, internet is off by default, sandbox is least-privilege, and every output is a pull request. Delegate boldly — the leash is short by design.

Reach the end and this star joins your charted sky.