[CS.AI] Evaluation of AI Systems on Research-Level Mathem...

Abstract

To assess the ability of current AI systems to correctly solve research-level mathematics problems, we tested several AI systems on a set of ten problems in a broad range of mathematical fields; these problems arose naturally in the research process of the contributors. This document includes the problems, our methodology, and the results of our testing. We provide links to supplementary documents including the human solutions, the AI-generated solutions, and the referee reports and logs for the AI-generated solutions.

The ten problems were contributed by the following mathematicians:

Dariusz Kalociński and Theodore A. Slaman
Richard Schwartz
Aleksa Milojevic and Benny Sudakov
Larry Guth
Oleg Butkovsky, Jonathan Mattingly, and Lorenzo Zambotti
Joshua Evan Greene and Duncan McCoy
Sucharit Sarkar
Sam Payne and Jidong (Jayden) Wang
Sylvie Corteel and John Lentfer
Srivatsav Kunnawalkam Elayavalli

Blogger's Review: This paper systematically evaluates AI's capability in solving complex mathematical problems, showcasing the collaborative potential between AI and human mathematicians, as well as the prospects for AI's application in real research. The testing results will provide important insights for future improvements in AI algorithms.

[CS.AI] Evaluation of AI Systems on Research-Level Mathematics Problems

Abstract