NeFut Logo NeFut
Admin Login

[DeepMind] Revolutionary Voice Translation: Gemini 3.5 Live Translate

Published at: 2026-06-14 22:00 Last updated: 2026-06-15 01:28
#AI #Machine Learning #Open Source

Gemini 3.5 Live Translate is our latest audio model, delivering near real-time speech-to-speech translation in over 70 languages. The model automatically detects languages and generates smooth, natural-sounding translated speech that preserves the speaker's intonation, pacing, and pitch. Unlike traditional turn-by-turn systems, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker. It delivers fluid audio without awkward pauses and stays just a few seconds behind the speaker throughout the session.

Gemini 3.5 Live Translate is rolling out across Google products:

The model processes speech as it’s streamed, enabling a seamless connection across languages. It handles multilingual inputs without manual configuration, and its noise robustness ensures functionality in loud environments. Users can leverage it for live interpretation in multilingual calls, meetings, lessons, broadcasts, and more.

By utilizing the Gemini Live API, developer platforms like Agora, Fishjam, LiveKit, and others enable developers to build and deploy voice translation apps with ease. These integrations manage the complex real-time media streaming infrastructure, allowing developers to focus on user experience.

Feedback has been positive, with companies like Grab, CJ ENM, and LiveKit praising 3.5 Live Translate's impressive translation quality, accuracy, and low latency. Speech translation in Google Meet will soon use 3.5 Live Translate, offering over 2000 language combinations and an updated interface for instant access.

Additionally, Android users will experience a new 'listening mode' with 3.5 Live Translate, allowing them to hear translations directly through their phone’s earpiece without headphones. All audio generated by our models is watermarked with SynthID, ensuring AI-generated content remains detectable to help prevent misinformation.

Blogger's Review: The launch of Gemini 3.5 Live Translate represents a significant advancement in voice translation technology, offering unprecedented user experiences with its fluidity and multilingual support. As technology continues to evolve, future communication will become more accessible. The model's wide range of applications is promising, and its performance in real-world scenarios is highly anticipated.

Original Source: https://deepmind.google/blog/fluid-natural-voice-translation-with-gemini-35-live-translate/

[h] Back to Home