Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have made notable progress, empirical studies show that BM25 continues to serve as a strong baseline. This motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training.
The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedback. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2.
Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a high-capacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM's capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.
Blogger's Review: This research showcases how self-evolution mechanisms can enhance the effectiveness of legal case retrieval, especially in the complex environment of legal language. By leveraging LLM capabilities, the system not only improves the accuracy of query rewriting but also adapts and refines itself, demonstrating strong adaptability and flexibility. This innovation brings fresh perspectives to the legal tech field and is worthy of attention.