[CS.AI] Building Trust in Legal AI: Typed Hallucination A...

Abstract

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components:

Typed Hallucination Profiles: Profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD dataset.
Risk Direction Index (RDI): Reduces omission-versus-invention bias to a single deployment-comparable scalar.
Typed Debate Pipeline: Calibrated to both magnitudes and directions.

Across 510 contracts and 249,252 clause-level instances, we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, showing that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45%, with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters).

Typed profiles and RDI surface failure modes that aggregate metrics hide; these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

Blogger's Review: The LegalHalluLens framework enhances the transparency and trustworthiness of legal AI through detailed hallucination analysis and multi-agent debate, with the introduction of RDI offering a novel approach for compliance monitoring, highlighting its significant practical implications and application potential.

[CS.AI] Building Trust in Legal AI: Typed Hallucination Auditing and Multi-Agent Debate Framework

Abstract