Abstract
This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2026 Simultaneous Speech Translation track. Our submission utilizes the recently released Parakeet and Qwen 3.5 models to create a robust, cascaded solution for long-form SimulST through the use of adaptive "black-box" policies. We explore relaxations of these policies to achieve better quality-latency trade-offs.
Compared to last year, we participate in all language directions. In addition, for the En$ ightarrow${De, It, Zh} directions, we also participate in this year's new context track employing a combination of ASR word-boosting and a RAG mechanism of offline pre-translated exemplars to guide generation and enrich our system with domain-specific context. Finally, we provide a detailed latency analysis of our system. Compared to last year, results on the MCIF En$ ightarrow$De test set show a substantial quality improvement of +5.82 XCOMET-XL. Our context track processing further improves performance by +1.03.
Blogger's Review: The MLLP-VRAIN team's innovative approach in the simultaneous speech translation task, particularly the adaptive black-box strategies and context enhancement mechanisms, demonstrates how to reduce latency while maintaining translation quality. This novel solution lays a solid foundation for future real-time translation systems and is worth following.