NeFut Logo NeFut
Admin Login

[DeepMind] Gemma 4 12B: A Revolutionary Encoder-Free Multimodal Model

Published at: 2026-06-14 22:00 Last updated: 2026-06-15 01:28
#AI #Machine Learning #Open Source

Today, we introduce Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops. Bridging the gap between our edge-friendly E4B and our more advanced 26B Mixture of Experts (MoE), Gemma 4 12B packages powerful capabilities inside a reduced memory footprint. It is also our first mid-sized model to feature native audio inputs. Thanks to the developer community, Gemma 4 models have now crossed 150 million downloads. We are excited to see what you build with this latest addition.

What Makes Gemma 4 12B Unique

These features bring advanced multimodal capabilities to everyday hardware without sacrificing speed or reasoning.

Run State-of-the-Art Agents Locally

Gemma 4 12B delivers performance nearing our larger 26B MoE model on standard benchmarks, but at less than half the total memory footprint. It is small enough to run locally on consumer laptops with 16GB of RAM, unlocking powerful multimodal and agentic experiences right on your machine.

Uniquely Efficient Unified Architecture

What makes Gemma 4 12B stand out is its streamlined approach to processing visual and audio inputs. Traditional multimodal models typically rely on separate encoders to translate images and audio before passing those representations to the language model. Because these split encoders add latency and increase memory usage, we trained Gemma 4 12B with an encoder-free architecture to integrate audio and vision input directly.

How Gemma 4 12B Processes Multimodal Inputs

Get Started with Gemma 4 12B

Unlock Agentic Development

To support agents to build with the latest Gemma advancements, we are releasing our official Skills Repository. This is a library of skills designed specifically to enable agents to build with Gemma models. Deploy your way: Spin up endpoints in production using Google Cloud. Deploy your way through Gemini Enterprise Agent Platform Model Garden, Cloud Run, and GKE.

Original Source: https://deepmind.google/blog/introducing-gemma-4-12b-a-unified-encoder-free-multimodal-model/

[h] Back to Home