NeFut Logo NeFut
Admin Login

[CS.AI] Enhancing Pathological VLMs with Cross-scale Reasoning

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:49
#AI #Machine Learning #Open Source

Abstract

Pathological images are inherently multi-scale, requiring pathologists to integrate evidence from global tissue architecture at low magnification to cellular morphology at higher magnification for accurate diagnosis. While existing pathological datasets for vision-language models (VLM) include various scales, they often lack an explicit cross-scale reasoning objective. This limitation prevents VLMs from capturing essential cross-scale representations and learning evidence-based reasoning.

To bridge this gap, we introduce the first cross-scale training and evaluation paradigm that formulates pathology interpretation as multi-magnification reasoning. However, creating such a task reveals a critical challenge: multi-image visual question answering (VQA) is prone to text-only shortcuts, which allow models to guess answers using magnification-dependent artifacts rather than visual evidence. To address this, we propose a leakage-aware curation pipeline that combines adversarial text-only screening with constraint-guided question design.

Using this pipeline, we construct Scale-VQA, a high-quality benchmark with 4,685 multiple-choice questions grounded in 2,537 pathology images across multiple magnification levels. Finally, we present ScaleReasoner-R1, a model trained via reinforcement learning to optimize performance on the cross-scale VQA task. ScaleReasoner-R1 achieves state-of-the-art performance on our cross-scale reasoning benchmark and generalizes to SOTA performance on established single-scale benchmarks. Findings suggest that even limited cross-scale supervision can significantly improve pathological understanding. The code and demos will be open-sourced.

Blogger's Review: This study introduces a crucial breakthrough in pathological image analysis by integrating cross-scale reasoning. The effective design of VQA and reinforcement learning strategies enhances model reasoning capabilities in complex scenarios, holding significant promise for practical applications. The open-source code will further foster related research advancements.

Original Source: https://arxiv.org/abs/2606.17412

[h] Back to Home