[CS.AI] MoDiCoL: A Modular Dataset for Robust Speech Reco...

Modern Automatic Speech Recognition (ASR) systems have made remarkable progress on standard benchmarks, yet performance gaps have emerged under real-world distribution shifts caused by recording conditions, accents, speech impairments, and noise. Existing datasets and benchmarks typically isolate these factors, overlooking their co-occurrence in real-world applications.

In this paper, we argue that model robustness can be treated as a dynamic capability that continually develops, and we introduce MoDiCoL, a Modular Diagnostic Continual Learning dataset designed for controlled analysis of linguistic content, speaker characteristics, and acoustic environments. Furthermore, we propose a real-world-inspired continual learning curriculum to simulate incremental updates and study how robustness is acquired, transferred, and forgotten.

We evaluate three continual learning strategies and provide detailed insights into robustness under evolving conditions.

Blogger's Review: The introduction of the MoDiCoL dataset opens up new avenues for research on the performance of speech recognition models in complex and changing environments. Its modular design allows researchers to analyze and improve model robustness more effectively, which is key to enhancing real-world application outcomes. The innovation and practicality of this approach are noteworthy.

[CS.AI] MoDiCoL: A Modular Dataset for Robust Speech Recognition