Related papers: SciNav: A General Agent Framework for Scientific C…

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and…

Artificial Intelligence · Computer Science 2026-04-01 Kuangshi Ai , Haichao Miao , Kaiyuan Tang , Nathaniel Gorski , Jianxin Sun , Guoxi Liu , Helgi I. Ingolfsson , David Lenz , Hanqi Guo , Hongfeng Yu , Teja Leburu , Michael Molash , Bei Wang , Tom Peterka , Chaoli Wang , Shusen Liu

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

The advancements of large language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about their true…

Computation and Language · Computer Science 2025-04-01 Ziru Chen , Shijie Chen , Yuting Ning , Qianheng Zhang , Boshi Wang , Botao Yu , Yifei Li , Zeyi Liao , Chen Wei , Zitong Lu , Vishal Dey , Mingyi Xue , Frazier N. Baker , Benjamin Burns , Daniel Adu-Ampratwum , Xuhui Huang , Xia Ning , Song Gao , Yu Su , Huan Sun

Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents

As scientific research becomes increasingly complex, innovative tools are needed to manage vast data, facilitate interdisciplinary collaboration, and accelerate discovery. Large language models (LLMs) are now evolving into LLM-based…

Artificial Intelligence · Computer Science 2026-02-03 Shuo Ren , Can Xie , Pu Jian , Zhenjiang Ren , Chunlin Leng , Jiajun Zhang

LLM Agents Making Agent Tools

Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers,…

Computation and Language · Computer Science 2025-06-02 Georg Wölflein , Dyke Ferber , Daniel Truhn , Ognjen Arandjelović , Jakob Nikolas Kather

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains…

Computation and Language · Computer Science 2025-10-09 Yixin Ou , Yujie Luo , Jingsheng Zheng , Lanning Wei , Zhuoyun Yu , Shuofei Qiao , Jintian Zhang , Da Zheng , Yuren Mao , Yunjun Gao , Huajun Chen , Ningyu Zhang

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

We present CodeNav, an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require ``registration'' of all relevant tools via manual descriptions within…

Artificial Intelligence · Computer Science 2024-06-19 Tanmay Gupta , Luca Weihs , Aniruddha Kembhavi

An Evaluation-Centric Paradigm for Scientific Visualization Agents

Recent advances in multi-modal large language models (MLLMs) have enabled increasingly sophisticated autonomous visualization agents capable of translating user intentions into data visualizations. However, measuring progress and comparing…

Human-Computer Interaction · Computer Science 2025-09-19 Kuangshi Ai , Haichao Miao , Zhimin Li , Chaoli Wang , Shusen Liu

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

A Cloud-based Multi-Agentic Workflow for Science

As Large Language Models (LLMs) become ubiquitous across various scientific domains, their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility. LLM-based agents bridge this gap…

Computation and Language · Computer Science 2026-01-21 Anurag Acharya , Timothy Vega , Rizwan A. Ashraf , Anshu Sharma , Derek Parker , Robert Rallo

Agent Laboratory: Using LLM Agents as Research Assistants

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research…

Human-Computer Interaction · Computer Science 2025-06-18 Samuel Schmidgall , Yusheng Su , Ze Wang , Ximeng Sun , Jialian Wu , Xiaodong Yu , Jiang Liu , Michael Moor , Zicheng Liu , Emad Barsoum

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a…

Computation and Language · Computer Science 2025-10-21 Ming Hu , Chenglong Ma , Wei Li , Wanghan Xu , Jiamin Wu , Jucheng Hu , Tianbin Li , Guohang Zhuang , Jiaqi Liu , Yingzhou Lu , Ying Chen , Chaoyang Zhang , Cheng Tan , Jie Ying , Guocheng Wu , Shujian Gao , Pengcheng Chen , Jiashi Lin , Haitao Wu , Lulu Chen , Fengxiang Wang , Yuanyuan Zhang , Xiangyu Zhao , Feilong Tang , Encheng Su , Junzhi Ning , Xinyao Liu , Ye Du , Changkai Ji , Pengfei Jiang , Cheng Tang , Ziyan Huang , Jiyao Liu , Jiaqi Wei , Yuejin Yang , Xiang Zhang , Guangshuai Wang , Yue Yang , Huihui Xu , Ziyang Chen , Yizhou Wang , Chen Tang , Jianyu Wu , Yuchen Ren , Siyuan Yan , Zhonghua Wang , Zhongxing Xu , Shiyan Su , Shangquan Sun , Runkai Zhao , Zhisheng Zhang , Dingkang Yang , Jinjie Wei , Jiaqi Wang , Jiahao Xu , Jiangtao Yan , Wenhao Tang , Hongze Zhu , Yu Liu , Fudi Wang , Yiqing Shen , Yuanfeng Ji , Yanzhou Su , Tong Xie , Hongming Shan , Chun-Mei Feng , Zhi Hou , Diping Song , Lihao Liu , Yanyan Huang , Lequan Yu , Bin Fu , Shujun Wang , Xiaomeng Li , Xiaowei Hu , Yun Gu , Ben Fei , Benyou Wang , Yuewen Cao , Minjie Shen , Jie Xu , Haodong Duan , Fang Yan , Hongxia Hao , Jielan Li , Jiajun Du , Yanbo Wang , Imran Razzak , Zhongying Deng , Chi Zhang , Lijun Wu , Conghui He , Zhaohui Lu , Jinhai Huang , Wenqi Shao , Yihao Liu , Siqi Luo , Yi Xin , Xiaohong Liu , Fenghua Ling , Yuqiang Li , Aoran Wang , Siqi Sun , Qihao Zheng , Nanqing Dong , Tianfan Fu , Dongzhan Zhou , Yan Lu , Wenlong Zhang , Jin Ye , Jianfei Cai , Yirong Chen , Wanli Ouyang , Yu Qiao , Zongyuan Ge , Shixiang Tang , Junjun He , Chunfeng Song , Lei Bai , Bowen Zhou

A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

The integration of Large Language Models (LLMs) into software engineering has driven a transition from traditional rule-based systems to autonomous agentic systems capable of solving complex problems. However, systematic progress is…

Software Engineering · Computer Science 2025-10-24 Jiale Guo , Suizhi Huang , Mei Li , Dong Huang , Xingsheng Chen , Regina Zhang , Zhijiang Guo , Han Yu , Siu-Ming Yiu , Pietro Lio , Kwok-Yan Lam

AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and…

Artificial Intelligence · Computer Science 2024-04-10 Luca Gioacchini , Giuseppe Siracusano , Davide Sanvito , Kiril Gashteovski , David Friede , Roberto Bifulco , Carolin Lawrence

SciAgent: Tool-augmented Language Models for Scientific Reasoning

Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning.…

Computation and Language · Computer Science 2024-02-22 Yubo Ma , Zhibin Gou , Junheng Hao , Ruochen Xu , Shuohang Wang , Liangming Pan , Yujiu Yang , Yixin Cao , Aixin Sun , Hany Awadalla , Weizhu Chen

Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs

Given the remarkable performance of Large Language Models (LLMs), an important question arises: Can LLMs conduct human-like scientific research and discover new knowledge, and act as an AI scientist? Scientific discovery is an iterative…

Machine Learning · Computer Science 2025-02-24 Tingting Chen , Srinivas Anumasa , Beibei Lin , Vedant Shah , Anirudh Goyal , Dianbo Liu

Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

Large language models (LLMs) serve as an active and promising field of generative artificial intelligence and have demonstrated abilities to perform complex tasks in multiple domains, including mathematical and scientific reasoning. In this…

Artificial Intelligence · Computer Science 2026-03-03 Ao Cheng , Lei Zhang , Guowei He

SciML Agents: Write the Solver, Not the Solution

Recent work in scientific machine learning aims to tackle scientific tasks directly by predicting target values with neural networks (e.g., physics-informed neural networks, neural ODEs, neural operators, etc.), but attaining high accuracy…

Machine Learning · Computer Science 2026-04-15 Saarth Gaonkar , Xiang Zheng , Haocheng Xi , Rishabh Tiwari , Kurt Keutzer , Dmitriy Morozov , Michael W. Mahoney , Amir Gholami

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning…

Computation and Language · Computer Science 2024-07-01 Xiaoxuan Wang , Ziniu Hu , Pan Lu , Yanqiao Zhu , Jieyu Zhang , Satyen Subramaniam , Arjun R. Loomba , Shichang Zhang , Yizhou Sun , Wei Wang

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. Driven by the growing need for standardized evaluation and integration, we…

Artificial Intelligence · Computer Science 2026-03-10 Mohamed Amine Ferrag , Norbert Tihanyi , Merouane Debbah

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them…

Computation and Language · Computer Science 2024-10-14 Yiming Huang , Jianwen Luo , Yan Yu , Yitong Zhang , Fangyu Lei , Yifan Wei , Shizhu He , Lifu Huang , Xiao Liu , Jun Zhao , Kang Liu