[CS.AI] Revolutionary Tech: SkillAudit Enables Ground-Tru...

Abstract

SkillAudit presents a framework for evolving agent skills without ground-truth feedback. Agent skills are structured procedural packages that guide frozen LLM agents in specific workflows. However, skills often become insufficient post-deployment due to edge cases, API changes, and deployment constraints, revealing the necessity for skill evolution.

Existing methods rely on privileged feedback such as held-out validation scores, hidden test outcomes, or environment rewards, which are often unavailable when practitioners only have task descriptions and workspace data. The key concept of SkillAudit is paired trajectory auditing: at each iteration, the same task is executed with and without the candidate skill, isolating how the skill alters agent behavior without external labels.

To translate behavioral differences into edit guidance, SkillAudit employs Process-Aligned Contrastive Evaluation (PACE), a cluster of evaluators that maps trajectory divergences to diagnostic signals linked to specific passages in the skill document. A structural verifier, fixed after initial compilation from task specifications, checks task constraints and rolls back harmful updates.

SkillAudit processes edits through two pipelines: Refine removes noisy or irrelevant guidance from broadly useful skills, while Repair replaces passages conflicting with the task. Across 89 containerized tasks spanning 8 professional domains, SkillAudit achieves an average task reward of 73.9%, outperforming an agent without skills (40.9%) and the static expert skill (56.7%). These improvements are obtained without accessing hidden tests, reference solutions, or external scoring functions during evolution.

Blogger's Review: SkillAudit's paired trajectory auditing approach offers an innovative method for skill evolution in environments lacking ground-truth feedback. By integrating Process-Aligned Contrastive Evaluation, it not only enhances task rewards but also effectively addresses uncertainties in skill evolution, showcasing broad application potential.

[CS.AI] Revolutionary Tech: SkillAudit Enables Ground-Truth-Free Skill Evolution

Abstract