[CS.AI] Closing the Feedback Loop in Verbal Reinforcement...

Training-free verbal reinforcement learning allows LLM agents to learn from world feedback, such as dynamic task outcomes, market returns, or demand forecasts, by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes.

However, in non-stationary environments, these agents face a retention-forgetting dilemma: retaining stale insights leads to negative transfer, while discarding them results in catastrophic forgetting when conditions recur.

We identify four key requirements to navigate this dilemma: outcome-driven evaluation, persistent structured evidence, non-monotonic knowledge lifecycle, and compositional governance. Current methods heavily invest in experience extraction but underinvest in insight governance.

We propose a three-layer architecture—rules, evidence, and skills—connected by a feedback-driven curation loop that closes the governance gap. Rules capture distilled experience from world outcomes; evidence logs track each rule's reliability across episodes; skills govern which rules to apply, how to resolve conflicts, and when to abstain.

Using financial forecasting as a case study, we demonstrate that the same accumulated experience can either degrade performance below the zero-shot baseline or greatly enhance accuracy and risk-adjusted returns, depending on the presence of the curation loop.

Blogger's Review: This paper effectively addresses the retention-forgetting issue in verbal reinforcement learning with a three-layer architecture, highlighting the importance of governance in utilizing experience. The introduction of a feedback-driven curation loop allows for more efficient management and application of experience, particularly in complex and dynamic financial environments, showcasing significant potential for practical applications.

[CS.AI] Closing the Feedback Loop in Verbal Reinforcement Learning