NeFut Logo NeFut
Admin Login

[CS.AI] MODE-RAG: Manifold Outlier Diagnosis and Energy-based Evaluation

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:49
#algorithm #AI #Machine Learning

Abstract

While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines often face an intervention paradox: static rules tend to unnecessarily disrupt accurate generations, whereas leaving the multi-modal reasoning completely unguided allows existing mismatches to cascade into severe logical fabrications.

To quantify and mitigate these hallucinations, we propose a Multi-Agent system, MODE-RAG, driven by Variational Free Energy (VFE) and internal attention states to dynamically gate interventions. High-risk queries are routed to five stage-specific agents, integrating Monte Carlo Tree Search (MCTS) for rigorous causal derivation and logit perturbations to penalize sycophancy.

Dedicated Correction and Overseer agents ensure formatting stability and perform post-hoc factual verification. To objectively evaluate our approach, we introduce ModeVent, a challenging subset derived from the MultiVent dataset. Extensive experiments indicate that our system effectively reduces hallucination rates and logical fabrication, significantly improving the robustness of M-RAG systems.

Blogger's Review: MODE-RAG addresses critical issues in multimodal generation by introducing a dynamic intervention mechanism and a multi-agent system, showcasing strong potential in reducing logical fabrications. This approach not only enriches existing generative model techniques but also opens new avenues for future research.

Original Source: https://arxiv.org/abs/2606.17449

[h] Back to Home