NeFut Logo NeFut
Admin Login

[CS.AI] Geometry-Consistent Endoscopic Representations for Navigation

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:47
#AI #Machine Learning #Open Source

Accurate vision-based navigation in monocular endoscopy faces challenges such as limited depth cues, weak tissue texture, non-rigid deformation, and substantial appearance variation across domains, complicating pose estimation, depth prediction, and image-to-anatomy alignment. While recent vision foundation models show promise, their learned representations often lack geometry-consistency, hindering stable feature correspondence and limiting reliability for downstream navigation tasks.

We propose a unified framework for learning geometry-consistent and domain-robust image representations for monocular endoscopy. This framework combines a synthetic data pipeline that offers accurate geometric supervision with Hierarchy-Aware Geometry-Semantic Adaptation, a structured alternative to standard LoRA. This method selectively inserts low-rank adapters across the transformer hierarchy, coupled with layer-wise training objectives to encourage geometric correspondence in intermediate features and semantic consistency in deeper features.

Experiments on public and proprietary datasets demonstrate improved geometric and semantic representation quality, leading to enhanced performance in downstream navigation tasks, including pose estimation and monocular depth estimation. The learned representations show favorable synthetic-to-real transfer in clinical bronchoscopy and provide a useful initialization for adaptation to sinus endoscopy and colonoscopy under limited supervision. The framework also exhibits favorable scaling with model size and training data. These results support hierarchy-aware, geometry-guided adaptation as a practical approach for endoscopic representation learning.

Blogger's Review: The proposed geometry-consistent learning framework combines synthetic data with hierarchical adaptation, significantly enhancing endoscopic image performance in navigation tasks. This presents a new perspective for medical image processing, especially showing great potential in clinical applications. The geometry-guided feature learning suggests that future endoscopic technology could become more precise and reliable.

Original Source: https://arxiv.org/abs/2606.17340

[h] Back to Home