Related papers: LEAN-GitHub: Compiling GitHub LEAN repositories fo…

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify…

Computation and Language · Computer Science 2024-05-27 Huaiyuan Ying , Shuo Zhang , Linyang Li , Zhejian Zhou , Yunfan Shao , Zhaoye Fei , Yichuan Ma , Jiawei Hong , Kuikun Liu , Ziyi Wang , Yudong Wang , Zijian Wu , Shuaibin Li , Fengzhe Zhou , Hongwei Liu , Songyang Zhang , Wenwei Zhang , Hang Yan , Xipeng Qiu , Jiayu Wang , Kai Chen , Dahua Lin

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

Large language models have demonstrated impressive capabilities across various natural language processing tasks, especially in solving mathematical problems. However, large language models are not good at math theorem proving using formal…

Computation and Language · Computer Science 2025-06-19 Huaiyuan Ying , Zijian Wu , Yihan Geng , Zheng Yuan , Dahua Lin , Kai Chen

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem…

Artificial Intelligence · Computer Science 2024-05-24 Huajian Xin , Daya Guo , Zhihong Shao , Zhizhou Ren , Qihao Zhu , Bo Liu , Chong Ruan , Wenda Li , Xiaodan Liang

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques

The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic…

Logic in Computer Science · Computer Science 2025-02-20 Sangjun Han , Taeil Hur , Youngmi Hur , Kathy Sangkyung Lee , Myungyoon Lee , Hyojae Lim

A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

Using AI to write formal proofs for mathematical problems is a challenging task that has seen some advancements in recent years. Automated systems such as Lean can verify the correctness of proofs written in formal language, yet writing the…

Machine Learning · Computer Science 2025-03-04 Roozbeh Yousefzadeh , Xuenan Cao , Azim Ospanov

StepFun-Prover Preview: Let's Think and Verify Step by Step

We present StepFun-Prover Preview, a large language model designed for formal theorem proving through tool-integrated reasoning. Using a reinforcement learning pipeline that incorporates tool-based interactions, StepFun-Prover can achieve…

Artificial Intelligence · Computer Science 2025-08-14 Shijie Shang , Ruosi Wan , Yue Peng , Yutong Wu , Xiong-hui Chen , Jie Yan , Xiangyu Zhang

Herald: A Natural Language Annotated Lean 4 Dataset

Verifiable formal languages like Lean have profoundly impacted mathematical reasoning, particularly through the use of large language models (LLMs) for automated reasoning. A significant challenge in training LLMs for these formal languages…

Computation and Language · Computer Science 2025-02-28 Guoxiong Gao , Yutong Wang , Jiedong Jiang , Qi Gao , Zihan Qin , Tianyi Xu , Bin Dong

On the effectiveness of Large Language Models for GitHub Workflows

GitHub workflows or GitHub CI is a popular continuous integration platform that enables developers to automate various software engineering tasks by specifying them as workflows, i.e., YAML files with a list of jobs. However, engineering…

Software Engineering · Computer Science 2024-03-20 Xinyu Zhang , Siddharth Muralee , Sourag Cherupattamoolayil , Aravind Machiry

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a large-scale Lean4 benchmark comprising…

Artificial Intelligence · Computer Science 2025-05-06 Zhouliang Yu , Ruotian Peng , Keyi Ding , Yizhe Li , Zhongyuan Peng , Minghao Liu , Yifan Zhang , Zheng Yuan , Huajian Xin , Wenhao Huang , Yandong Wen , Ge Zhang , Weiyang Liu

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs)…

Formal Languages and Automata Theory · Computer Science 2024-10-07 Ruida Wang , Jipeng Zhang , Yizhen Jia , Rui Pan , Shizhe Diao , Renjie Pi , Tong Zhang

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically checked. Formal theorem proving systems such as Lean 4 offer automated…

Artificial Intelligence · Computer Science 2026-03-18 Sumanth Varambally , Thomas Voice , Yanchao Sun , Zhifeng Chen , Rose Yu , Ke Ye

From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification,…

Artificial Intelligence · Computer Science 2025-06-10 Jialun Cao , Yaojie Lu , Meiziniu Li , Haoyang Ma , Haokun Li , Mengda He , Cheng Wen , Le Sun , Hongyu Zhang , Shengchao Qin , Shing-Chi Cheung , Cong Tian

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics plays an important role in improving the reasoning abilities of large language models. For example, Minerva, a PaLM model…

Artificial Intelligence · Computer Science 2023-10-11 Keiran Paster , Marco Dos Santos , Zhangir Azerbayev , Jimmy Ba

Lean Finder: Semantic Search for Mathlib That Understands User Intents

We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians. Progress in formal theorem proving is often hindered by the difficulty of locating relevant theorems and…

Machine Learning · Computer Science 2026-02-24 Jialin Lu , Kye Emond , Kaiyu Yang , Swarat Chaudhuri , Weiran Sun , Wuyang Chen

REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise…

Computation and Language · Computer Science 2025-11-25 Ziju Shen , Naohao Huang , Fanyi Yang , Yutong Wang , Guoxiong Gao , Tianyi Xu , Jiedong Jiang , Wanyi He , Pu Yang , Mengzhou Sun , Haocheng Ju , Peihao Wu , Bryan Dai , Bin Dong

MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted the mathematical and computer science communities. State-of-the-art methods utilize a single Large Language Model (LLM) to generate…

Computation and Language · Computer Science 2025-05-28 Ruida Wang , Rui Pan , Yuxin Li , Jipeng Zhang , Yizhen Jia , Shizhe Diao , Renjie Pi , Junjie Hu , Tong Zhang

miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

We perform a thorough analysis of the formal and informal statements in the miniF2F benchmark from the perspective of an AI system that is tasked to participate in a math Olympiad consisting of the problems in miniF2F. In such setting, the…

Artificial Intelligence · Computer Science 2025-11-06 Azim Ospanov , Farzan Farnia , Roozbeh Yousefzadeh

Learning to Reason with Insight for Informal Theorem Proving

Although most of the automated theorem-proving approaches depend on formal proof systems, informal theorem proving can align better with large language models' (LLMs) strength in natural language processing. In this work, we identify a…

Artificial Intelligence · Computer Science 2026-04-20 Yunhe Li , Hao Shi , Bowen Deng , Wei Wang , Mengzhe Ruan , Hanxu Hou , Zhongxiang Dai , Siyang Gao , Chao Wang , Shuang Qiu , Linqi Song

Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4

Formalizing mathematical proofs using computerized verification languages like Lean 4 has the potential to significantly impact the field of mathematics, it offers prominent capabilities for advancing mathematical reasoning. However,…

Computation and Language · Computer Science 2024-11-11 Xichen Tang

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of…

Machine Learning · Computer Science 2025-04-22 Yong Lin , Shange Tang , Bohan Lyu , Jiayun Wu , Hongzhou Lin , Kaiyu Yang , Jia Li , Mengzhou Xia , Danqi Chen , Sanjeev Arora , Chi Jin