The Cartographer · 11 min mission
Gemini CLI: Models & Context Windows
Master the Gemini model lineup, auto-routing, the 1M-token window, and which auth path unlocks which models and quotas.
Gemini CLI runs in auto mode by default: it classifies each prompt as simple or complex and routes it to a Flash or Pro model on its own. This guide covers the model lineup you can reach, how to pin or switch models, how auto-routing decides, and how your auth method sets your daily quota. After it you can select a specific model, read which one actually ran, and budget your requests.
| Alias | Resolves to | Use it for |
|---|---|---|
auto | gemini-2.5-pro or gemini-3-pro-preview | Default. Gemini 3 model if preview features are enabled, else standard Pro |
pro | gemini-2.5-pro or gemini-3-pro-preview | Complex reasoning. Same Gemini 3 promotion as auto when preview is on |
flash | gemini-2.5-flash | Fast, balanced model for everyday tasks |
flash-lite | gemini-2.5-flash-lite | Fastest, lightest model for simple tasks |
| Model ID | Input tokens | Output tokens | Knowledge cutoff |
|---|---|---|---|
gemini-3-pro-preview | 1,000,000 (1M) | 64k | January 2025 |
gemini-3.1-pro-preview | 1,000,000 (1M) | 64k | January 2025 |
gemini-3-flash-preview | 1,000,000 (1M) | 64k | January 2025 |
gemini-2.5-pro | 1,048,576 | 65,536 | — |
gemini-2.5-flash | 1,048,576 | 65,536 | — |
gemini-2.5-flash-lite | 1,048,576 | 65,536 | — |
All current Gemini coding models share a ~1M-token input window, so model choice is about speed, cost, and reasoning depth, not how much of your repo fits. The figure is written two ways: the Gemini 3 family is documented as 1M / 64k output, the 2.5 family as the exact 1,048,576 input / 65,536 output (2^20). The literal limit appears in errors — an oversized 2.5 prompt fails with input token count … exceeds the maximum number of tokens allowed (1048576).
Select or pin a model
Pin one model for a session
Pass
-m/--modelat launch — it always wins:gemini -m gemini-2.5-flash. Accepts an alias or a concrete ID.Set a default via environment
Export
GEMINI_MODEL(used only when no-mflag is given):export GEMINI_MODEL=gemini-2.5-pro.Set a persistent default in settings
Add
model.nameto~/.gemini/settings.json(used only when neither-mnorGEMINI_MODELis set):{ "model": { "name": "gemini-2.5-flash" } }.Switch mid-session
Run
/modelto open the Auto/Manual dialog; the change applies to all subsequent interactions in that session.
| Option | Behaviour | Models in play |
|---|---|---|
| Auto (Gemini 3) | System picks the best Gemini 3 model for the prompt | gemini-3-pro-preview, gemini-3-flash-preview |
| Auto (Gemini 2.5) | System picks the best Gemini 2.5 model for the prompt | gemini-2.5-pro, gemini-2.5-flash |
| Manual | You pick one specific model and it stays put | Any available model |
Auto routing is on by default, managed by the ModelAvailabilityService. It classifies each prompt and routes accordingly: simple → gemini-2.5-flash; complex → gemini-3-pro-preview if Gemini 3 is enabled, else gemini-2.5-pro. Picking Pro in /model instead biases toward the most capable available model (Gemini 3 Pro when enabled). The CLI's own internal calls — prompt completion, classification — use gemini-2.5-flash-lite and silently fall back through gemini-2.5-flash to gemini-2.5-pro without prompting or changing your configured model.
| Trigger | CLI offers |
|---|---|
| Gemini 3 Pro daily limit reached | Switch to Gemini 2.5 Pro / upgrade / stop |
| Gemini 2.5 Pro daily limit reached | Fall back to Gemini 2.5 Flash |
| Gemini 3 Pro temporarily overloaded | "Keep trying" (exponential backoff) or fall back to 2.5 Pro |
Pick a model and see what routing selects
Which Claude model?
Three quick questions about your task, your tolerance for latency, and your budget — and you'll get a single model to reach for, with the reasoning behind it. All four current models are in the legend below.
All four models
The most capable widely released model — built for the hardest reasoning and long-horizon agentic work.
The most capable Opus-tier model for complex reasoning and agentic coding.
The best combination of speed and intelligence — the everyday workhorse.
The fastest model with near-frontier intelligence — for snappy, high-volume work.
| Auth method | Tier | Requests / user / day | Notes |
|---|---|---|---|
| Google account (Login with Google) | Code Assist Individual — free | 1,000 | 60 req/min, across the full model family |
| Google account | Google AI Pro | 1,500 | Paid, fixed-price subscription |
| Google account | Google AI Ultra | 2,000 | Paid |
| Gemini API key | Free (unpaid) | 250 | Flash model only, 10 req/min |
| Gemini API key | Pay-as-you-go | No daily cap | Billed per token/call — avoids interruption |
| Vertex AI | Express mode (free) | Account-specific | Free for 90 days, then billing required |
| Vertex AI | Pay-as-you-go | No daily cap | Shared/provisioned quota, billed on usage |
| Google Workspace | Code Assist Standard | 1,500 | Paid, license seats |
| Google Workspace | Code Assist Enterprise | 2,000 | Paid |
Your sign-in method, not the model menu, is the biggest lever on what you can run. The CLI supports three: Sign in with Google (OAuth), a Gemini API key (export GEMINI_API_KEY="...", key from aistudio.google.com/app/apikey), and Vertex AI (needs GOOGLE_CLOUD_PROJECT + the Vertex AI API enabled). The Google-account path gives the Code Assist Individual free tier — 1,000 requests/day across the full family, the only free way to reach the Pro models. The free API-key path is Flash-only at 250/day. Pay-as-you-go on an API key or Vertex removes the daily cap and bills per token.
Enable Gemini 3 in the CLI
Update the CLI to a supported version
Gemini 3 requires Gemini CLI 0.21.1 or later. Upgrade with
npm install -g @google/gemini-cli@latest(current stable is v0.46.0, published 2026-06-10).Switch routing to Gemini 3
Launch
gemini, run/model, choose Auto (Gemini 3). Complex prompts then route togemini-3-pro-preview, andauto/proresolve to the Gemini 3 model.On Code Assist Standard / Enterprise, flip the preview switches
Managed accounts need more: an admin sets the release channel to Preview (Admin for Gemini → Settings), then you set Preview Features =
truevia/settingsand restart. Gemini 3 will not appear from upgrading alone on a managed account.
Two free paths, very different ceilings
Sign in with Google (free)
1,000 requests/day, 60/min, across the whole family — Flash and Pro (Gemini 3 Pro when enabled).
Most individual accounts need no Google Cloud project. Start here.
Bare Gemini API key (free)
250 requests/day, Flash only, 10/min.
No Pro access, a quarter of the quota. Useful for scripting against a key you already have — not a way to unlock the best model for free.
Knowledge check
You sign into Gemini CLI with a free personal Google account, enable Gemini 3, leave the model on `auto`, and ask a hard refactoring question. Which model handles it, and against which budget?
Reach the end and this star joins your charted sky.