[CS.AI] Disentangling Hallucinations with Orthogonal Sema...

As Vision-Language Models are increasingly deployed in safety-critical applications, the trustworthiness of their explanations becomes crucial. Explainable AI (XAI) methods for Vision-Language Models often suffer from semantic hallucination, where attribution maps highlight prominent image regions even when prompted with incorrect text descriptions (e.g., highlighting a dog when prompted "cat"). This problem is not specific to a single architecture but is a fundamental consequence of Linear Semantic Leakage in high-dimensional embedding spaces. We propose a unified theoretical framework, Linear Semantic Attribution (LSA), which generalizes across discriminative methods. We introduce Orthogonal Semantic Projection (OSP), a geometric intervention that utilizes the residual property of OMP to disentangle unique semantic signals from shared concepts. We prove theoretically and demonstrate empirically that OSP minimizes hallucination by orthogonalizing the query vector against distractor concepts, rendering the attribution model blind to shared features while preserving fidelity for correct prompts. Our code is available at: GitHub Repository.

Blogger's Review: The proposed Orthogonal Semantic Projection (OSP) offers a novel approach to addressing semantic hallucination issues in Vision-Language Models, particularly in high-dimensional embedding spaces. The combination of theoretical and empirical analysis enhances its credibility, and future work should focus on optimizing this method for more complex scenarios.

[CS.AI] Disentangling Hallucinations with Orthogonal Semantic Projection