[DeepMind] Gemini 3.1 Flash-Lite: High-Efficiency AI Mode...

Gemini 3.1 Flash-Lite is now available in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI. Priced at $0.25/1M input tokens and $1.50/1M output tokens, it is cost-efficient and faster than 2.5 Flash. This model is suitable for tasks such as translation, content moderation, user interface generation, and simulation creation.

Gemini 3.1 Flash-Lite is Google's latest AI model designed for high-frequency developer workloads, delivering high quality at a reduced cost. According to the Artificial Analysis benchmark, it achieves a 2.5X faster Time to First Answer Token and a 45% increase in output speed compared to 2.5 Flash, making it ideal for low-latency real-time experiences.

Additionally, 3.1 Flash-Lite achieved an impressive Elo score of 1432 on the Arena.ai Leaderboard, outperforming similar-tier models across reasoning and multimodal understanding benchmarks, including 86.9% on GPQA Diamond and 76.8% on MMMU Pro, even surpassing larger models from previous generations like 2.5 Flash.

With the thinking levels in AI Studio and Vertex AI, developers can flexibly select how much the model “thinks” for a task, which is crucial for managing high-frequency workloads. 3.1 Flash-Lite can handle large-scale tasks like high-volume translation and content moderation, as well as more complex workloads requiring in-depth reasoning, such as generating user interfaces and dashboards, creating simulations, or following instructions.

For instance, 3.1 Flash-Lite can instantly fill an e-commerce wireframe with numerous products and quickly analyze and sort large amounts of content like images. Early testers praised its efficiency and reasoning capabilities, noting it can handle complex inputs with the precision of a larger-tier model while following instructions. We look forward to seeing what you build with 3.1 Flash-Lite and the rest of the Gemini 3 series models.

Blogger's Review: The launch of Gemini 3.1 Flash-Lite marks a significant advancement in scalable intelligent applications. Its efficient cost and outstanding performance will open up more innovative possibilities for developers, especially in real-time processing and complex task handling, making it worth close attention.

[DeepMind] Gemini 3.1 Flash-Lite: High-Efficiency AI Model for Scalable Intelligence