NeFut Logo NeFut
Admin Login

[CS.AI] MLLP-VRAIN's Breakthrough in IWSLT 2026 Simultaneous Speech Translation

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:47
#AI #Machine Learning #Open Source

Abstract

This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2026 Simultaneous Speech Translation track. Our submission utilizes the recently released Parakeet and Qwen 3.5 models to create a robust, cascaded solution for long-form SimulST through the use of adaptive "black-box" policies. We explore relaxations of these policies to achieve better quality-latency trade-offs.

Compared to last year, we participate in all language directions. In addition, for the En$ ightarrow${De, It, Zh} directions, we also participate in this year's new context track employing a combination of ASR word-boosting and a RAG mechanism of offline pre-translated exemplars to guide generation and enrich our system with domain-specific context. Finally, we provide a detailed latency analysis of our system. Compared to last year, results on the MCIF En$ ightarrow$De test set show a substantial quality improvement of +5.82 XCOMET-XL. Our context track processing further improves performance by +1.03.

Blogger's Review: The MLLP-VRAIN team's innovative approach in the simultaneous speech translation task, particularly the adaptive black-box strategies and context enhancement mechanisms, demonstrates how to reduce latency while maintaining translation quality. This novel solution lays a solid foundation for future real-time translation systems and is worth following.

Original Source: https://arxiv.org/abs/2606.17255

[h] Back to Home