NeFut Logo NeFut
Admin Login

[CS.AI] Evaluation of AI Systems on Research-Level Mathematics Problems

Published at: 2026-06-18 22:00 Last updated: 2026-06-20 13:47
#AI #Machine Learning #optimization

Abstract

To assess the ability of current AI systems to correctly solve research-level mathematics problems, we tested several AI systems on a set of ten problems in a broad range of mathematical fields; these problems arose naturally in the research process of the contributors. This document includes the problems, our methodology, and the results of our testing. We provide links to supplementary documents including the human solutions, the AI-generated solutions, and the referee reports and logs for the AI-generated solutions.

The ten problems were contributed by the following mathematicians:

  1. Dariusz Kalociński and Theodore A. Slaman
  2. Richard Schwartz
  3. Aleksa Milojevic and Benny Sudakov
  4. Larry Guth
  5. Oleg Butkovsky, Jonathan Mattingly, and Lorenzo Zambotti
  6. Joshua Evan Greene and Duncan McCoy
  7. Sucharit Sarkar
  8. Sam Payne and Jidong (Jayden) Wang
  9. Sylvie Corteel and John Lentfer
  10. Srivatsav Kunnawalkam Elayavalli

Blogger's Review: This paper systematically evaluates AI's capability in solving complex mathematical problems, showcasing the collaborative potential between AI and human mathematicians, as well as the prospects for AI's application in real research. The testing results will provide important insights for future improvements in AI algorithms.

Original Source: https://arxiv.org/abs/2606.18119

[h] Back to Home