[CS.AI] Challenges of Semi-Autonomous Formalization: An E...

Large language models can often close proof gaps in interactive theorem provers, but a verified theorem is not the same as a reusable library contribution. This paper studies this distinction through a detailed case study: a semi-autonomous formalization of Grothendieck's vanishing theorem. The initial version compiles with no errors, but an expert review found serious problems in definitions, theorem generality, file organization, and the API.

We then ran a review-driven refactor and compression process and obtained a second expert review. The before-and-after comparison shows a sharp split: agents adapted well to local, mechanically checkable feedback, but remained weak at choosing definitions and designing APIs. We argue that autoformalization should be evaluated not only by closed errors, but by whether the resulting formalization survives expert review.

Blogger's Review: This article highlights the technical challenges in the autoformalization process, particularly in the areas of definition and API design. The feedback from expert reviews is crucial for improving formalization quality, emphasizing the importance of human-machine collaboration.

[CS.AI] Challenges of Semi-Autonomous Formalization: An Expert Review Case Study