[CS.AI] Mirage Probes: Unveiling False Visual Understandi...

Vision-language models (VLMs) can answer image-based questions confidently and often correctly, even when no image is provided. This mirage behavior inflates benchmark scores without reflecting true visual grounding. Prior work treats this as a single failure mode, but we argue it is two. Using Mirage Probes, a contrastive probing framework that pairs paraphrased question variants with matched mirage and non-mirage labels on the same image, we show that mirage behavior is linearly decodable from internal activations across residual stream, MLP, post-attention, and attention-head sites in two open-source VLMs. We demonstrate that a Naive Bayes text baseline cannot recover this signal, ruling out surface lexical confounds. Cross-benchmark separability patterns, along with a novel Prior Harnessing Index (PHI) that measures how much a model can answer from text alone, expose two distinct regimes: textual biases, where the model answers from language priors without engaging visual representations, and spurious images, where it constructs false visual content in latent space and answers as if grounded. This distinction has direct mitigation consequences: text-distribution cleaning can address the first regime but cannot reach the second, since spurious-image mirages reside in the model's visual representations rather than its text. Faithful visual grounding will require interventions at the representational level.

Blogger's Review: This article profoundly unveils the potential shortcomings of vision-language models in handling visual information, especially how they can produce seemingly reasonable outputs without actual visual input. By investigating mirage probes, it emphasizes the need for a closer examination of the genuine interaction between visual and textual elements in model training and evaluation, pushing future research towards more authentic visual understanding.

[CS.AI] Mirage Probes: Unveiling False Visual Understanding in Vision Models