Abstract
People make decisions differently in strategic interactions. Some update beliefs like a Bayesian; others exhibit biases like motivated reasoning. Although creators of large language models use simulated humans for safety evaluations and training, they often fail to cover this breadth of human behavior. We argue that cognitive science and economics provide a convenient tool for doing so, making use of mathematical models of human decision-making.
We propose an approach called Equation-to-Behavior Prompting for guiding large language models to match cognitive models, evaluating this approach on persuasion games based on legal decision-making. We find that large models can approximate equation-based specifications -- Bayesian updating, affine distortion, motivated updating, and Grether's $\alpha$-$\beta$ model -- using prompting, but small models fail to do so. However, training small models with reinforcement learning to adhere to mathematical rules, Equation-to-Behavior RL, reduces belief error by 26.5% in out-of-distribution parameterizations.
We show that these simulations can help create diverse training environments; training small models to consider different kinds of decision-makers improves average belief change by 2.5%--12% over Bayesian-only training, even when persuading GPT-5-mini. Our work could improve human simulations for training and evaluation in increasingly realistic settings and could also enable novel research into more complicated mathematical models of human decision-making.
Blogger's Review: This paper showcases how integrating cognitive models with language model training can enhance the realism of human decision simulations. This method not only pushes the boundaries of language model applications but also offers new perspectives for future research, particularly in the intersection of law and psychology, where its application potential is significant.