The Cartographer · 11 min mission

Gemini CLI: Models & Context Windows

Master the Gemini model lineup, auto-routing, the 1M-token window, and which auth path unlocks which models and quotas.

gemini-climodelscontext-windowquotasmodel-routingauthenticationFact-checked 2026-06-15

Gemini CLI runs in auto mode by default: it classifies each prompt as simple or complex and routes it to a Flash or Pro model on its own. This guide covers the model lineup you can reach, how to pin or switch models, how auto-routing decides, and how your auth method sets your daily quota. After it you can select a specific model, read which one actually ran, and budget your requests.

AliasResolves toUse it for
autogemini-2.5-pro or gemini-3-pro-previewDefault. Gemini 3 model if preview features are enabled, else standard Pro
progemini-2.5-pro or gemini-3-pro-previewComplex reasoning. Same Gemini 3 promotion as auto when preview is on
flashgemini-2.5-flashFast, balanced model for everyday tasks
flash-litegemini-2.5-flash-liteFastest, lightest model for simple tasks
The four model aliases and what each resolves to (`docs/cli/cli-reference.md`). `auto` is the documented default for `--model`.
Model IDInput tokensOutput tokensKnowledge cutoff
gemini-3-pro-preview1,000,000 (1M)64kJanuary 2025
gemini-3.1-pro-preview1,000,000 (1M)64kJanuary 2025
gemini-3-flash-preview1,000,000 (1M)64kJanuary 2025
gemini-2.5-pro1,048,57665,536
gemini-2.5-flash1,048,57665,536
gemini-2.5-flash-lite1,048,57665,536
Context windows by model. Gemini 3 is documented as the round "1M / 64k" (`ai.google.dev/gemini-api/docs/gemini-3`); the 2.5 series uses the exact 2^20 figure.

All current Gemini coding models share a ~1M-token input window, so model choice is about speed, cost, and reasoning depth, not how much of your repo fits. The figure is written two ways: the Gemini 3 family is documented as 1M / 64k output, the 2.5 family as the exact 1,048,576 input / 65,536 output (2^20). The literal limit appears in errors — an oversized 2.5 prompt fails with input token count … exceeds the maximum number of tokens allowed (1048576).

Select or pin a model

  1. Pin one model for a session

    Pass -m / --model at launch — it always wins: gemini -m gemini-2.5-flash. Accepts an alias or a concrete ID.

  2. Set a default via environment

    Export GEMINI_MODEL (used only when no -m flag is given): export GEMINI_MODEL=gemini-2.5-pro.

  3. Set a persistent default in settings

    Add model.name to ~/.gemini/settings.json (used only when neither -m nor GEMINI_MODEL is set): { "model": { "name": "gemini-2.5-flash" } }.

  4. Switch mid-session

    Run /model to open the Auto/Manual dialog; the change applies to all subsequent interactions in that session.

OptionBehaviourModels in play
Auto (Gemini 3)System picks the best Gemini 3 model for the promptgemini-3-pro-preview, gemini-3-flash-preview
Auto (Gemini 2.5)System picks the best Gemini 2.5 model for the promptgemini-2.5-pro, gemini-2.5-flash
ManualYou pick one specific model and it stays putAny available model
The `/model` dialog: three top-level options and the concrete models each can reach (`docs/cli/model.md`).

Auto routing is on by default, managed by the ModelAvailabilityService. It classifies each prompt and routes accordingly: simple → gemini-2.5-flash; complex → gemini-3-pro-preview if Gemini 3 is enabled, else gemini-2.5-pro. Picking Pro in /model instead biases toward the most capable available model (Gemini 3 Pro when enabled). The CLI's own internal calls — prompt completion, classification — use gemini-2.5-flash-lite and silently fall back through gemini-2.5-flash to gemini-2.5-pro without prompting or changing your configured model.

TriggerCLI offers
Gemini 3 Pro daily limit reachedSwitch to Gemini 2.5 Pro / upgrade / stop
Gemini 2.5 Pro daily limit reachedFall back to Gemini 2.5 Flash
Gemini 3 Pro temporarily overloaded"Keep trying" (exponential backoff) or fall back to 2.5 Pro
Routing fallback prompts when a limit or capacity issue is hit (`docs/cli/model-routing.md`, `docs/get-started/gemini-3.md`).
auto routing picks the model for you
… scroll to run this session
Two prompts, one session. The router sends the trivial question to Flash and the architecture task to the most capable Pro model available — without switching anything.

Pick a model and see what routing selects

Which Claude model?

Three quick questions about your task, your tolerance for latency, and your budget — and you'll get a single model to reach for, with the reasoning behind it. All four current models are in the legend below.

0/3
Question 1 of 3

How hard is the task?

All four models

Claude Fable 5$10 / $50 per MTok

The most capable widely released model — built for the hardest reasoning and long-horizon agentic work.

Claude Opus 4.8$5 / $25 per MTok

The most capable Opus-tier model for complex reasoning and agentic coding.

Claude Sonnet 4.6$3 / $15 per MTok

The best combination of speed and intelligence — the everyday workhorse.

Claude Haiku 4.5$1 / $5 per MTok

The fastest model with near-frontier intelligence — for snappy, high-volume work.

Compare the model tiers and routing outcomes. In Gemini CLI the equivalent controls are `/model` → Auto (Gemini 3) / Manual, the `-m` flag, `GEMINI_MODEL`, and `model.name` in `~/.gemini/settings.json`.
Auth methodTierRequests / user / dayNotes
Google account (Login with Google)Code Assist Individual — free1,00060 req/min, across the full model family
Google accountGoogle AI Pro1,500Paid, fixed-price subscription
Google accountGoogle AI Ultra2,000Paid
Gemini API keyFree (unpaid)250Flash model only, 10 req/min
Gemini API keyPay-as-you-goNo daily capBilled per token/call — avoids interruption
Vertex AIExpress mode (free)Account-specificFree for 90 days, then billing required
Vertex AIPay-as-you-goNo daily capShared/provisioned quota, billed on usage
Google WorkspaceCode Assist Standard1,500Paid, license seats
Google WorkspaceCode Assist Enterprise2,000Paid
CLI quotas by auth method and tier (`docs/resources/quota-and-pricing.md`; corroborated on `developers.google.com/gemini-code-assist/resources/quotas`). Daily limits are aggregated across the model family.

Your sign-in method, not the model menu, is the biggest lever on what you can run. The CLI supports three: Sign in with Google (OAuth), a Gemini API key (export GEMINI_API_KEY="...", key from aistudio.google.com/app/apikey), and Vertex AI (needs GOOGLE_CLOUD_PROJECT + the Vertex AI API enabled). The Google-account path gives the Code Assist Individual free tier — 1,000 requests/day across the full family, the only free way to reach the Pro models. The free API-key path is Flash-only at 250/day. Pay-as-you-go on an API key or Vertex removes the daily cap and bills per token.

Enable Gemini 3 in the CLI

  1. Update the CLI to a supported version

    Gemini 3 requires Gemini CLI 0.21.1 or later. Upgrade with npm install -g @google/gemini-cli@latest (current stable is v0.46.0, published 2026-06-10).

  2. Switch routing to Gemini 3

    Launch gemini, run /model, choose Auto (Gemini 3). Complex prompts then route to gemini-3-pro-preview, and auto/pro resolve to the Gemini 3 model.

  3. On Code Assist Standard / Enterprise, flip the preview switches

    Managed accounts need more: an admin sets the release channel to Preview (Admin for Gemini → Settings), then you set Preview Features = true via /settings and restart. Gemini 3 will not appear from upgrading alone on a managed account.

Two free paths, very different ceilings

Sign in with Google (free)

1,000 requests/day, 60/min, across the whole family — Flash and Pro (Gemini 3 Pro when enabled).

Most individual accounts need no Google Cloud project. Start here.

Bare Gemini API key (free)

250 requests/day, Flash only, 10/min.

No Pro access, a quarter of the quota. Useful for scripting against a key you already have — not a way to unlock the best model for free.

Knowledge check

You sign into Gemini CLI with a free personal Google account, enable Gemini 3, leave the model on `auto`, and ask a hard refactoring question. Which model handles it, and against which budget?

Reach the end and this star joins your charted sky.