The Cartographer · 11 min mission

Gemini CLI: Models & Context Windows

Master the Gemini model lineup, auto-routing, the 1M-token window, and which auth path unlocks which models and quotas.

gemini-climodelscontext-windowquotasmodel-routingauthenticationFact-checked 2026-06-15

Gemini CLI runs in auto mode by default: it classifies each prompt as simple or complex and routes it to a Flash or Pro model on its own. This guide covers the model lineup you can reach, how to pin or switch models, how auto-routing decides, and how your auth method sets your daily quota. After it you can select a specific model, read which one actually ran, and budget your requests.

Alias	Resolves to	Use it for
`auto`	`gemini-2.5-pro` or `gemini-3-pro-preview`	Default. Gemini 3 model if preview features are enabled, else standard Pro
`pro`	`gemini-2.5-pro` or `gemini-3-pro-preview`	Complex reasoning. Same Gemini 3 promotion as `auto` when preview is on
`flash`	`gemini-2.5-flash`	Fast, balanced model for everyday tasks
`flash-lite`	`gemini-2.5-flash-lite`	Fastest, lightest model for simple tasks

The four model aliases and what each resolves to (`docs/cli/cli-reference.md`). `auto` is the documented default for `--model`.

Model ID	Input tokens	Output tokens	Knowledge cutoff
`gemini-3-pro-preview`	1,000,000 (1M)	64k	January 2025
`gemini-3.1-pro-preview`	1,000,000 (1M)	64k	January 2025
`gemini-3-flash-preview`	1,000,000 (1M)	64k	January 2025
`gemini-2.5-pro`	1,048,576	65,536	—
`gemini-2.5-flash`	1,048,576	65,536	—
`gemini-2.5-flash-lite`	1,048,576	65,536	—

Context windows by model. Gemini 3 is documented as the round "1M / 64k" (`ai.google.dev/gemini-api/docs/gemini-3`); the 2.5 series uses the exact 2^20 figure.

All current Gemini coding models share a ~1M-token input window, so model choice is about speed, cost, and reasoning depth, not how much of your repo fits. The figure is written two ways: the Gemini 3 family is documented as 1M / 64k output, the 2.5 family as the exact 1,048,576 input / 65,536 output (2^20). The literal limit appears in errors — an oversized 2.5 prompt fails with input token count … exceeds the maximum number of tokens allowed (1048576).

Select or pin a model

Pin one model for a session
Pass -m / --model at launch — it always wins: gemini -m gemini-2.5-flash. Accepts an alias or a concrete ID.
Set a default via environment
Export GEMINI_MODEL (used only when no -m flag is given): export GEMINI_MODEL=gemini-2.5-pro.
Set a persistent default in settings
Add model.name to ~/.gemini/settings.json (used only when neither -m nor GEMINI_MODEL is set): { "model": { "name": "gemini-2.5-flash" } }.
Switch mid-session
Run /model to open the Auto/Manual dialog; the change applies to all subsequent interactions in that session.

Option	Behaviour	Models in play
Auto (Gemini 3)	System picks the best Gemini 3 model for the prompt	`gemini-3-pro-preview`, `gemini-3-flash-preview`
Auto (Gemini 2.5)	System picks the best Gemini 2.5 model for the prompt	`gemini-2.5-pro`, `gemini-2.5-flash`
Manual	You pick one specific model and it stays put	Any available model

The `/model` dialog: three top-level options and the concrete models each can reach (`docs/cli/model.md`).

Auto routing is on by default, managed by the ModelAvailabilityService. It classifies each prompt and routes accordingly: simple → gemini-2.5-flash; complex → gemini-3-pro-preview if Gemini 3 is enabled, else gemini-2.5-pro. Picking Pro in /model instead biases toward the most capable available model (Gemini 3 Pro when enabled). The CLI's own internal calls — prompt completion, classification — use gemini-2.5-flash-lite and silently fall back through gemini-2.5-flash to gemini-2.5-pro without prompting or changing your configured model.

Trigger	CLI offers
Gemini 3 Pro daily limit reached	Switch to Gemini 2.5 Pro / upgrade / stop
Gemini 2.5 Pro daily limit reached	Fall back to Gemini 2.5 Flash
Gemini 3 Pro temporarily overloaded	"Keep trying" (exponential backoff) or fall back to 2.5 Pro

Routing fallback prompts when a limit or capacity issue is hit (`docs/cli/model-routing.md`, `docs/get-started/gemini-3.md`).

auto routing picks the model for you

… scroll to run this session

Two prompts, one session. The router sends the trivial question to Flash and the architecture task to the most capable Pro model available — without switching anything.

Pick a model and see what routing selects

Which Claude model?

Three quick questions about your task, your tolerance for latency, and your budget — and you'll get a single model to reach for, with the reasoning behind it. All four current models are in the legend below.

0/3

Question 1 of 3

How hard is the task?

All four models

Claude Fable 5$10 / $50 per MTok

The most capable widely released model — built for the hardest reasoning and long-horizon agentic work.

Claude Opus 4.8$5 / $25 per MTok

The most capable Opus-tier model for complex reasoning and agentic coding.

Claude Sonnet 4.6$3 / $15 per MTok

The best combination of speed and intelligence — the everyday workhorse.

Claude Haiku 4.5$1 / $5 per MTok

The fastest model with near-frontier intelligence — for snappy, high-volume work.

Compare the model tiers and routing outcomes. In Gemini CLI the equivalent controls are `/model` → Auto (Gemini 3) / Manual, the `-m` flag, `GEMINI_MODEL`, and `model.name` in `~/.gemini/settings.json`.

Auth method	Tier	Requests / user / day	Notes
Google account (Login with Google)	Code Assist Individual — free	1,000	60 req/min, across the full model family
Google account	Google AI Pro	1,500	Paid, fixed-price subscription
Google account	Google AI Ultra	2,000	Paid
Gemini API key	Free (unpaid)	250	Flash model only, 10 req/min
Gemini API key	Pay-as-you-go	No daily cap	Billed per token/call — avoids interruption
Vertex AI	Express mode (free)	Account-specific	Free for 90 days, then billing required
Vertex AI	Pay-as-you-go	No daily cap	Shared/provisioned quota, billed on usage
Google Workspace	Code Assist Standard	1,500	Paid, license seats
Google Workspace	Code Assist Enterprise	2,000	Paid

CLI quotas by auth method and tier (`docs/resources/quota-and-pricing.md`; corroborated on `developers.google.com/gemini-code-assist/resources/quotas`). Daily limits are aggregated across the model family.

Your sign-in method, not the model menu, is the biggest lever on what you can run. The CLI supports three: Sign in with Google (OAuth), a Gemini API key (export GEMINI_API_KEY="...", key from aistudio.google.com/app/apikey), and Vertex AI (needs GOOGLE_CLOUD_PROJECT + the Vertex AI API enabled). The Google-account path gives the Code Assist Individual free tier — 1,000 requests/day across the full family, the only free way to reach the Pro models. The free API-key path is Flash-only at 250/day. Pay-as-you-go on an API key or Vertex removes the daily cap and bills per token.

Enable Gemini 3 in the CLI

Update the CLI to a supported version
Gemini 3 requires Gemini CLI 0.21.1 or later. Upgrade with npm install -g @google/gemini-cli@latest (current stable is v0.46.0, published 2026-06-10).
Switch routing to Gemini 3
Launch gemini, run /model, choose Auto (Gemini 3). Complex prompts then route to gemini-3-pro-preview, and auto/pro resolve to the Gemini 3 model.
On Code Assist Standard / Enterprise, flip the preview switches
Managed accounts need more: an admin sets the release channel to Preview (Admin for Gemini → Settings), then you set Preview Features = true via /settings and restart. Gemini 3 will not appear from upgrading alone on a managed account.

Two free paths, very different ceilings

1,000 requests/day, 60/min, across the whole family — Flash and Pro (Gemini 3 Pro when enabled).

Most individual accounts need no Google Cloud project. Start here.

Bare Gemini API key (free)

250 requests/day, Flash only, 10/min.

No Pro access, a quarter of the quota. Useful for scripting against a key you already have — not a way to unlock the best model for free.

Knowledge check

You sign into Gemini CLI with a free personal Google account, enable Gemini 3, leave the model on `auto`, and ask a hard refactoring question. Which model handles it, and against which budget?

Reach the end and this star joins your charted sky.