[DeepMind] Gemini 2.5: Major Updates to Thinking Models

Today, we are excited to share updates across the Gemini 2.5 model family:

Gemini 2.5 Pro is now generally available and stable (no changes from the 06-05 preview).
Gemini 2.5 Flash is generally available and stable (no changes from the 05-20 preview, see pricing updates below).
Gemini 2.5 Flash-Lite is now available in preview.

Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Each model has control over the thinking budget, allowing developers to choose when and how much the model “thinks” before generating a response.

Introducing Gemini 2.5 Flash-Lite

Today, we’re introducing 2.5 Flash-Lite in preview with the lowest latency and cost in the 2.5 model family. It’s designed as a cost-effective upgrade from our previous 1.5 and 2.0 Flash models. It also offers better performance across most evaluations, and lower time to first token while achieving higher tokens per second decode. This model is great for high throughput tasks like classification or summarization at scale.

2.5 Flash-Lite is a reasoning model, which allows for dynamic control of the thinking budget with an API parameter. Because Flash-Lite is optimized for cost and speed, “thinking” is off by default, unlike our other models. 2.5 Flash-Lite also supports all of our native tools like Grounding with Google Search, Code Execution, and URL Context in addition to function calling.

Updates to Gemini 2.5 Flash and Pricing

Over the last year, our research teams have continued to push the Pareto frontier with our Flash model series. When 2.5 Flash was initially announced, we had not yet finalized the capabilities for 2.5 Flash-Lite. We launched with a “thinking” and “non-thinking price,” which led to developer confusion. With the stable version of Gemini 2.5 Flash rolling out (which is the same 05-20 model preview we made available at Google I/O), we are updating the pricing for 2.5 Flash:

$0.30 / 1M input tokens (*up from $0.15 input)
$2.50 / 1M output tokens (*down from $3.50 output)

We removed the thinking vs. non-thinking price difference and kept a single price tier regardless of input token size. While we strive to maintain consistent pricing between preview and stable releases to minimize disruption, this specific adjustment reflects Flash’s exceptional value, still offering the best cost-per-intelligence available.

Continued Growth of Gemini 2.5 Pro

The growth and demand for Gemini 2.5 Pro continues to be the steepest of any models we have ever seen. To allow more customers to build on this model in production, we are making the 06-05 version of the model stable, with the same Pareto frontier price point as before. We expect that cases where you need the highest intelligence and most capabilities are where you will see Pro shine, like coding and agentic tasks.

If you are using 2.5 Pro Preview 05-06, the model will remain available until June 19, 2025, and then will be turned off. If you are using 2.5 Pro Preview 06-05, you can simply update your model string to “gemini-2.5-pro.” We can’t wait to see even more domains benefit from the intelligence of 2.5 Pro and look forward to sharing more about scaling beyond Pro in the near future.

Blogger's Review: The updates to the Gemini 2.5 series showcase DeepMind's ongoing innovation in reasoning models, particularly with the introduction of Flash-Lite, which significantly enhances efficiency and cost-effectiveness, providing developers with more options to fit various application scenarios. This flexibility will further promote the adoption and development of AI models in practical applications.