NeFut Logo NeFut
Admin Login

[CS.AI] SANA Framework: Unveiling Key Issues for QA Agents in Data Lakes

Published at: 2026-06-16 22:00 Last updated: 2026-06-17 01:38
#algorithm #Machine Learning #Open Source

Abstract

Exploratory question answering (EQA) over data lakes requires an LLM agent to discover relevant sources, analyze retrieved data, and adapt its actions based on intermediate results. End-to-end accuracy alone cannot distinguish failures in search, planning, data analysis, or the agent's Action Policy: its decisions about what to do next and when to submit an answer.

We present SANA (Search Agent Navigation Ablation framework), a diagnostic ablation framework that transforms EQA tasks into runtime profiles containing gold source sequence, sanitized subquestions, and execution records. SANA uses these profiles to construct idealized search, planning, and data-analysis tools, allowing each component to be ablated; the residual gap is diagnostic evidence for policy failures.

To illustrate SANA as a reusable evaluation framework, we adapted two recent EQA benchmarks, LakeQA and KramaBench, and evaluated lightweight and mid-sized agents under fixed prompts, budgets, data lakes, and runtimes. Across both benchmarks, data analysis is a consistent bottleneck while planning is less so. Search is a major limitation in LakeQA's large data-lake setting, but less so for the smaller-scale KramaBench. Thus, SANA deconstructs end-to-end task accuracies into a diagnosis of where data-lake agents fail, allowing for systematic comparisons of progress in search, planning, data analysis, and agent design.

Blogger's Review: The SANA framework reveals the limitations of existing EQA agents by analyzing each component within data lakes, offering a clear direction for future research. This approach enables a better understanding and optimization of question-answering system performance.

Original Source: https://arxiv.org/abs/2606.13904

[h] Back to Home