[CS.AI] Controlling LLM-Generated Policies with Auditable...

Allowing LLMs to directly control costly and irreversible scientific experiments can lead to unsafe exploration and unstable performance. However, discarding LLM creativity entirely sacrifices significant optimization potential. Thus, we introduce CARE (Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation), an auditable controller for high-throughput experimentation (HTE) optimization that maintains a non-LLM incumbent optimizer as the default action path while using LLMs to revise challenger ranking policies.

Before each outcome is revealed, a public-evidence intervention gate compares the challenger with the incumbent. The challenger is authorized for selection only when the available evidence supports the change, with the decision recorded in the audit log. CARE outperforms all other evaluated methods on Minerva/Olympus and ChemLex benchmarks, with the final-best improving from 80.0 to 88.5 on Minerva/Olympus and from 83.9 to 92.1 on ChemLex, relative to the public incumbent.

Our experiments indicate that LLM self-evolution is more reliable when it expands the proposal space under an auditable controller, rather than directly choosing experiments.

Blogger's Review: The CARE method successfully balances LLM creativity with experimental safety by introducing an auditable control mechanism. Its outstanding performance in benchmark tests demonstrates the practical application value of this approach in optimizing high-throughput experiments.

[CS.AI] Controlling LLM-Generated Policies with Auditable Evidence Review