Today, we are releasing the stable version of Gemini 2.5 Flash-Lite, our fastest and lowest-cost model in the Gemini 2.5 family ($0.10 input per 1M, $0.40 output per 1M). We built 2.5 Flash-Lite to push the frontier of intelligence per dollar, with native reasoning capabilities that can be toggled on for more demanding use cases.
Key Highlights
- Best in-class speed: Gemini 2.5 Flash-Lite has lower latency than both 2.0 Flash-Lite and 2.0 Flash across a broad sample of prompts.
- Cost-efficiency: It’s our lowest-cost 2.5 model yet, priced at $0.10 for 1M input tokens and $0.40 for output tokens, enabling affordable handling of large volumes of requests.
- Smart and small: It demonstrates overall higher quality than 2.0 Flash-Lite across various benchmarks, including coding, math, science, reasoning, and multimodal understanding.
- Fully featured: Building with 2.5 Flash-Lite provides access to a 1 million-token context window, controllable thinking budgets, and support for native tools like Grounding with Google Search, Code Execution, and URL Context.
Real-world Applications
Since the launch of 2.5 Flash-Lite, we have already seen some incredibly successful deployments:
- Satlyt: Building a decentralized space computing platform, leveraging 2.5 Flash-Lite for a 45% reduction in latency for critical onboard diagnostics and a 30% decrease in power consumption.
- HeyGen: Using AI to create avatars for video content, automating video planning, and translating videos into over 180 languages.
- DocsHound: Transforming product demos into documentation using 2.5 Flash-Lite to process long videos and extract screenshots quickly.
- Evertune: Utilizing 2.5 Flash-Lite to dramatically speed up analysis and report generation, providing clients with dynamic insights.
You can start using 2.5 Flash-Lite by specifying “gemini-2.5-flash-lite” in your code. If you are using the preview version, switch to “gemini-2.5-flash-lite.” We plan to remove the preview alias on August 25th. Ready to start building? Try the stable version of Gemini 2.5 Flash-Lite now in Google AI Studio and Vertex AI.
Blogger's Review: The launch of Gemini 2.5 Flash-Lite signifies a crucial balance between cost and performance in AI models, especially for applications requiring efficient processing. Its low latency and high efficiency are set to drive innovation across various industries, making it a must-explore for developers.