[CS.AI] Unveiling Audio Models: Entropy-Guided Explainabi...

Abstract

Transformer-based automatic speech recognition (ASR) models like Whisper are highly accurate, yet their predictions remain challenging to interpret. Existing explainable AI (XAI) methods often lack faithfulness and precise temporal grounding. We propose Listening with Entropy-guided Attention for Faithful explainability (LEAF-X), a model-intrinsic XAI framework for transformer-based ASR. LEAF-X combines entropy-guided attention weighting, multi-layer attention rollout, and optional causal ablations to identify low-entropy, high-impact heads and layers, producing sparse token-to-frame attributions.

Unlike perturbation-based explainers or raw attention maps, LEAF-X exploits the internal structure of encoder-decoder and speech-augmented decoder-only models to generate explanations that better reflect model computation. Results show a 32% improvement in faithfulness, a 35-39% increase in locality/sparsity, and the most stable attributions, supporting more transparent and auditable ASR.

Blogger's Review: The introduction of LEAF-X not only advances the field of explainability in automatic speech recognition but also offers a novel perspective on model transparency through its entropy-guided mechanism. This approach could significantly enhance user trust in model decisions in practical applications, warranting further exploration and optimization.

[CS.AI] Unveiling Audio Models: Entropy-Guided Explainability

Abstract