Related papers: Agentic Proof Automation: A Case Study

Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs

We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and…

Artificial Intelligence · Computer Science 2026-02-18 Kaito Baba , Chaoran Liu , Shuhei Kurita , Akiyoshi Sannai

ImProver: Agent-Based Automated Proof Optimization

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its…

Artificial Intelligence · Computer Science 2026-05-22 Riyaz Ahuja , Jeremy Avigad , Prasad Tetali , Sean Welleck

Agentic Verification of Software Systems

Automatically generated code is gaining traction recently, owing to the prevalence of Large Language Models (LLMs). Further, the AlphaProof initiative has demonstrated the possibility of using AI for general mathematical reasoning.…

Software Engineering · Computer Science 2026-04-14 Haoxin Tu , Huan Zhao , Yahui Song , Mehtab Zafar , Ruijie Meng , Abhik Roychoudhury

The Potential of LLMs in Automating Software Testing: From Generation to Reporting

Having a high quality software is essential in software engineering, which requires robust validation and verification processes during testing activities. Manual testing, while effective, can be time consuming and costly, leading to an…

Software Engineering · Computer Science 2025-01-03 Betim Sherifi , Khaled Slhoub , Fitzroy Nembhard

Agent Laboratory: Using LLM Agents as Research Assistants

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research…

Human-Computer Interaction · Computer Science 2025-06-18 Samuel Schmidgall , Yusheng Su , Ze Wang , Ximeng Sun , Jialian Wu , Xiaodong Yu , Jiang Liu , Michael Moor , Zicheng Liu , Emad Barsoum

Advancing Mathematics Research with AI-Driven Formal Proof Search

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the…

Artificial Intelligence · Computer Science 2026-05-22 George Tsoukalas , Anton Kovsharov , Sergey Shirobokov , Anja Surina , Moritz Firsching , Gergely Bérczi , Francisco J. R. Ruiz , Arun Suggala , Adam Zsolt Wagner , Eric Wieser , Lei Yu , Aja Huang , Miklós Z. Horváth , Andrew Ferrauiolo , Henryk Michalewski , Codrut Grosu , Thomas Hubert , Matej Balog , Pushmeet Kohli , Swarat Chaudhuri

LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios

Recent advances in the intrinsic reasoning capabilities of large language models (LLMs) have given rise to LLM-based agent systems that exhibit near-human performance on a variety of automated tasks. However, although these systems share…

Artificial Intelligence · Computer Science 2025-08-26 Bingxi Zhao , Lin Geng Foo , Ping Hu , Christian Theobalt , Hossein Rahmani , Jun Liu

Automating Formal Verification with Agent-Guided Tree Search

Formal verification offers a path to provably correct software, but writing verified code remains expensive enough that the technique is rarely used in production. Recent large language models can accelerate this work, and recent benchmarks…

Logic in Computer Science · Computer Science 2026-05-28 Leo Yao

Sound Agentic Science Requires Adversarial Experiments

LLM-based agents are rapidly being adopted for scientific data analysis, automating tasks once limited by human time and expertise. This capability is often framed as an acceleration of discovery, but it also accelerates a familiar failure…

Artificial Intelligence · Computer Science 2026-05-21 Dionizije Fa , Marko Culjak

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

Towards LLM-based Generation of Human-Readable Proofs in Polynomial Formal Verification

Verification is one of the central tasks in circuit and system design. While simulation and emulation are widely used, complete correctness can only be ensured based on formal proof techniques. But these approaches often have very high run…

Logic in Computer Science · Computer Science 2025-05-30 Rolf Drechsler

Towards Autonomous Testing Agents via Conversational Large Language Models

Software testing is an important part of the development cycle, yet it requires specialized expertise and substantial developer effort to adequately test software. Recent discoveries of the capabilities of large language models (LLMs)…

Software Engineering · Computer Science 2023-09-06 Robert Feldt , Sungmin Kang , Juyeon Yoon , Shin Yoo

Agentic Software Issue Resolution with Large Language Models: A Survey

Software issue resolution aims to address real-world issues in software repositories (e.g., bug fixing and efficiency optimization) based on natural language descriptions provided by users, representing a key aspect of software maintenance.…

Software Engineering · Computer Science 2025-12-30 Zhonghao Jiang , David Lo , Zhongxin Liu

A Self-Improving Coding Agent

Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We…

Artificial Intelligence · Computer Science 2025-05-20 Maxime Robeyns , Martin Szummer , Laurence Aitchison

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Large Language Models (LLMs) have demonstrated advanced capabilities in real-world agentic applications. Growing research efforts aim to develop LLM-based agents to address practical demands, introducing a new challenge: agentic scenarios…

Artificial Intelligence · Computer Science 2025-05-23 Yunjia Qi , Hao Peng , Xiaozhi Wang , Amy Xin , Youfeng Liu , Bin Xu , Lei Hou , Juanzi Li

Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents

Modern engineering increasingly relies on vast datasets generated by experiments and simulations, driving a growing demand for efficient, reliable, and broadly applicable modeling strategies. There is also heightened interest in developing…

Artificial Intelligence · Computer Science 2025-10-03 Yang Liu , Zaid Abulawi , Abhiram Garimidi , Doyeong Lim

A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

The integration of Large Language Models (LLMs) into software engineering has driven a transition from traditional rule-based systems to autonomous agentic systems capable of solving complex problems. However, systematic progress is…

Software Engineering · Computer Science 2025-10-24 Jiale Guo , Suizhi Huang , Mei Li , Dong Huang , Xingsheng Chen , Regina Zhang , Zhijiang Guo , Han Yu , Siu-Ming Yiu , Pietro Lio , Kwok-Yan Lam

PROMISE: Proof Automation as Structural Imitation of Human Reasoning

Automated proof generation for formal software verification remains largely unresolved despite advances in large language models (LLMs). While LLMs perform well in NLP, vision, and code generation, formal verification still requires…

Logic in Computer Science · Computer Science 2026-04-10 Youngjoo Ahn , Sangyeop Yeo , Gijung Im , Jongmin Lee , Jinyoung Yeo , Jieung Kim

Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents

In industrial control systems, the generation and verification of Programmable Logic Controller (PLC) code are critical for ensuring operational efficiency and safety. While Large Language Models (LLMs) have made strides in automated code…

Software Engineering · Computer Science 2024-12-30 Zihan Liu , Ruinan Zeng , Dongxia Wang , Gengyun Peng , Jingyi Wang , Qiang Liu , Peiyu Liu , Wenhai Wang

Automating Security Audit Using Large Language Model based Agent: An Exploration Experiment

In the current rapidly changing digital environment, businesses are under constant stress to ensure that their systems are secured. Security audits help to maintain a strong security posture by ensuring that policies are in place, controls…

Cryptography and Security · Computer Science 2025-05-19 Jia Hui Chin , Pu Zhang , Yu Xin Cheong , Jonathan Pan