English
Related papers

Related papers: Benchmarking Data Science Agents

200 papers

Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI…

Artificial Intelligence · Computer Science 2025-04-14 Liqiang Jing , Zhehui Huang , Xiaoyang Wang , Wenlin Yao , Wenhao Yu , Kaixin Ma , Hongming Zhang , Xinya Du , Dong Yu

In recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution,…

Artificial Intelligence · Computer Science 2025-12-01 Maojun Sun , Ruijian Han , Binyan Jiang , Houduo Qi , Defeng Sun , Yancheng Yuan , Jian Huang

Recent advances in large language models (LLMs) have enabled a new class of AI agents that automate multiple stages of the data science workflow by integrating planning, tool use, and multimodal reasoning across text, code, tables, and…

Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers,…

Artificial Intelligence · Computer Science 2026-01-21 Maojun Sun , Yifei Xie , Yue Wu , Ruijian Han , Binyan Jiang , Defeng Sun , Yancheng Yuan , Jian Huang

Data science aims to extract insights from data to support decision-making processes. Recently, Large Language Models (LLMs) have been increasingly used as assistants for data science, by suggesting ideas, techniques and small code…

Artificial Intelligence · Computer Science 2025-10-23 Irene Testini , José Hernández-Orallo , Lorenzo Pacchiardi

Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analysis to realize their full value. With the advent…

Artificial Intelligence · Computer Science 2026-05-29 Zhenghao Zhu , Yuanfeng Song , Xin Chen , Chengzhong Liu , Yakun Cui , Caleb Chen Cao , Sirui Han , Yike Guo

The rapid advancement of Large Language Models (LLMs) has driven novel applications across diverse domains, with LLM-based agents emerging as a crucial area of exploration. This survey presents a comprehensive analysis of LLM-based agents…

Artificial Intelligence · Computer Science 2025-11-25 Ke Chen , Peiran Wang , Yaoning Yu , Xianyang Zhan , Haohan Wang

Recent advances in large language models (LLMs) have significantly impacted data science workflows, giving rise to specialized data science agents designed to automate analytical tasks. Despite rapid adoption, systematic benchmarks…

Artificial Intelligence · Computer Science 2025-08-08 Ram Mohan Rao Kadiyala , Siddhant Gupta , Jebish Purbey , Giulio Martini , Ali Shafique , Suman Debnath , Hamza Farooq

Large Language Models (LLMs) show promise as data analysis agents, but existing benchmarks overlook the iterative nature of the field, where experts' decisions evolve with deeper insights of the dataset. To address this, we introduce…

Computation and Language · Computer Science 2025-06-09 Hanyu Li , Haoyu Liu , Tingyu Zhu , Tianyu Guo , Zeyu Zheng , Xiaotie Deng , Michael I. Jordan

Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs) have emerged as a promising solution for…

Artificial Intelligence · Computer Science 2025-12-09 Zhou Liu , Zhaoyang Han , Guochen Yan , Hao Liang , Bohan Zeng , Xing Chen , Yuanfeng Song , Wentao Zhang

In recent years, Large Language Models (LLMs) have emerged as transformative tools across numerous domains, impacting how professionals approach complex analytical tasks. This systematic mapping study comprehensively examines the…

Computers and Society · Computer Science 2025-08-19 Sai Sanjna Chintakunta , Nathalia Nascimento , Everton Guimaraes

Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data…

Artificial Intelligence · Computer Science 2025-10-21 Shaolei Zhang , Ju Fan , Meihao Fan , Guoliang Li , Xiaoyong Du

We introduce DABstep, a novel benchmark for evaluating AI agents on realistic multi-step data analysis tasks. DABstep comprises over 450 real-world challenges derived from a financial analytics platform, requiring models to combine…

Machine Learning · Computer Science 2025-07-01 Alex Egg , Martin Iglesias Goyanes , Friso Kingma , Andreu Mora , Leandro von Werra , Thomas Wolf

While large language models (LLMs) have shown promise in automating data science, existing agents often struggle with the complexity of real-world workflows that require exploring multiple sources and synthesizing open-ended insights. In…

Artificial Intelligence · Computer Science 2026-02-25 Jaehyun Nam , Jinsung Yoon , Jiefeng Chen , Raj Sinha , Jinwoo Shin , Tomas Pfister

The quality of datasets plays an increasingly crucial role in the research and development of modern artificial intelligence (AI). Despite the proliferation of open dataset platforms nowadays, data quality issues, such as incomplete…

Artificial Intelligence · Computer Science 2025-05-28 Benhao Huang , Yingzhuo Yu , Jin Huang , Xingjian Zhang , Jiaqi Ma

Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs…

Computation and Language · Computer Science 2024-01-31 Steffi Chern , Ethan Chern , Graham Neubig , Pengfei Liu

Existing large language model (LLM) agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We…

Computation and Language · Computer Science 2025-10-06 Ziming You , Yumiao Zhang , Dexuan Xu , Yiwei Lou , Yandong Yan , Wei Wang , Huaming Zhang , Yu Huang

In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution…

Computation and Language · Computer Science 2024-03-12 Xueyu Hu , Ziyu Zhao , Shuang Wei , Ziwei Chai , Qianli Ma , Guoyin Wang , Xuwu Wang , Jing Su , Jingjing Xu , Ming Zhu , Yao Cheng , Jianbo Yuan , Jiwei Li , Kun Kuang , Yang Yang , Hongxia Yang , Fei Wu

Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning…

Artificial Intelligence · Computer Science 2025-05-21 He Wang , Alexander Hanbo Li , Yiqun Hu , Sheng Zhang , Hideo Kobayashi , Jiani Zhang , Henry Zhu , Chung-Wei Hang , Patrick Ng

The advancements of large language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about their true…

‹ Prev 1 2 3 10 Next ›