Gemini 2.0 Flash is shutting down: 2026 API pricing and migration options

Google marks Gemini 2.0 Flash and 2.0 Flash-Lite for shutdown on June 1, 2026. This guide compares 2.5 Flash-Lite, 2.5 Flash, Gemini 3 Flash, and 3.1 Flash-Lite as migration paths.

1. Why this needs attention now

Google's Gemini deprecations page lists both gemini-2.0-flash and gemini-2.0-flash-lite with a shutdown date of June 1, 2026. That is not a marketing label. Once a model is shut down, the endpoint stops being available, so production apps should migrate before that date rather than waiting for errors to show up in logs.

The direct replacements are simple on paper: move 2.0 Flash to 2.5 Flash, and 2.0 Flash-Lite to 2.5 Flash-Lite. The cost story is less simple. 2.5 Flash-Lite keeps the old 2.0 Flash price class, while 2.5 Flash is much more expensive on output tokens. Newer Gemini 3-family options add another trade-off: better long-term runway, but higher token pricing.

2. Replacement models at a glance

Current model Default replacement Use when
gemini-2.0-flash-lite gemini-2.5-flash-lite High-volume classification, extraction, routing, translation, and simple multimodal jobs.
gemini-2.0-flash gemini-2.5-flash Chat, RAG, agent workflows, and workloads that need the 1M-token context window.
Planning a second migration gemini-3.1-flash-lite Teams that want a newer low-cost Gemini 3-family model and can afford higher token pricing.

Gemini 3 Flash Preview and Gemini 3.5 Flash are not drop-in cost replacements for most 2.0 Flash users. Use them when quality, grounding, or newer model behavior matters more than preserving the old bill.

3. Where the pricing changes

The numbers below use Google's Gemini API pricing page as checked on May 24, 2026. Prices are USD per 1M tokens for text/image/video unless noted.

Model Input Output Cached input
Gemini 2.0 Flash $0.10 $0.40 $0.025
Gemini 2.0 Flash-Lite $0.075 $0.30 Not available
Gemini 2.5 Flash-Lite $0.10 $0.40 $0.01
Gemini 2.5 Flash $0.30 $2.50 $0.03
Gemini 3.1 Flash-Lite $0.25 $1.50 $0.025
Gemini 3 Flash Preview $0.50 $3.00 $0.05

The important jump is output price. Moving from 2.0 Flash to 2.5 Flash triples input price, but raises output price from $0.40 to $2.50. If your app produces long answers, explanations, or agent traces, the migration can be more than a small version bump.

Search grounding is a separate line item. The 2.5 Flash family keeps the older paid-tier search grounding structure: 1,500 free grounded prompts per day, then $35 per 1,000 grounded prompts. Gemini 3-family pricing uses a monthly free bucket for all Gemini 3 models, then $14 per 1,000 search queries. A single prompt can trigger more than one search query, so measure this from billing data before turning grounding on by default.

4. Pick by workload

For bulk classification, extraction, moderation, routing, and translation, start with 2.5 Flash-Lite. It is the closest practical replacement for both 2.0 Flash-Lite and many lightweight 2.0 Flash workloads.

For customer support chat and RAG, start with 2.5 Flash. The higher output price hurts, but the 1M-token context window and context caching can matter more than sticker price when you repeat long instructions, policies, or retrieved documents across many requests.

For agents and tool-use workflows, test 2.5 Flash-Lite and 2.5 Flash side by side. The right answer depends on how often the model retries, calls tools, or emits long intermediate reasoning. Gemini output pricing includes thinking tokens where the pricing page says so, so cost can move faster than visible response length.

For search-grounded answers, compare the token bill and the grounding bill separately. Gemini 3-family search pricing can be cheaper per listed unit, but the model token price is higher and each prompt may issue multiple search queries.

5. Migration checklist

  1. Replace the model id in one staging environment first, not in production.
  2. Replay real prompts and log input tokens, output tokens, thinking tokens, cache hits, and tool calls.
  3. Compare monthly cost with your actual input/output ratio, not only the pricing table.
  4. Check whether your workload depends on preview models, rate limits, or search grounding behavior.
  5. Run both old and new models for at least a few days before June 1, 2026, then switch traffic gradually.

If you only need a default answer: migrate 2.0 Flash-Lite to 2.5 Flash-Lite, and migrate 2.0 Flash to 2.5 Flash only after you have measured output length. If the new output bill is too high, try 2.5 Flash-Lite before jumping to a Gemini 3 model.