Abstract
We propose Base Sequence Analysis, a framework that encodes the runtime behavior of LLM-powered autonomous agents into compact symbolic sequences using a four-letter alphabet: X (Explore), E (Execute), P (Plan), and V (Verify).
Drawing an analogy to genomic sequence analysis, we apply n-gram pattern mining, Markov transition matrices, and point-biserial correlation to 347 real-world execution traces collected from a production ReAct agent system over 8 days.
Our analysis reveals that:
- The trigram P-X-P is the only statistically significant high-risk pattern, lowering success rate by 10.4%;
- P-ratio is the strongest negative predictor of success (r=-0.256, p < 0.01);
- The E-V transition probability is only 2.1%, indicating a systemic verification deficit.
Based on these findings, we design Governor, a three-layer runtime intervention system comprising a rule engine, a statistical accumulator, and a chi-square-based threshold adaptor. In a natural before/after deployment evaluation (N=101 vs. N=246), Governor achieves a +6.2% absolute increase in task success rate while simultaneously reducing average token consumption by 44%.
To validate cross-system generality, we apply the XEPV encoding to 2,000 public SWE-agent trajectories on SWE-bench, confirming that exploration spirals and the E-V verification deficit replicate in an independent system. We outline six research directions including base sequence language models, cross-agent behavioral fingerprinting, and reward shaping, and release an open-source toolkit for reproducibility.
Blogger's Review: This paper provides a novel perspective on understanding and optimizing the behavior of LLM agents through genomic-style sequence analysis. It not only uncovers high-risk behavior patterns but also proposes effective intervention mechanisms, making it valuable for research and application. The release of the open-source toolkit will further facilitate studies and practices in the field.