English
Related papers

Related papers: DA-Code: Agent Data Science Code Generation Benchm…

200 papers

Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm. Distinct from previous code generation techniques, code generation agents are characterized by three core features. 1)…

Software Engineering · Computer Science 2025-10-01 Yihong Dong , Xue Jiang , Jiaru Qian , Tian Wang , Kechi Zhang , Zhi Jin , Ge Li

Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI…

Artificial Intelligence · Computer Science 2025-04-14 Liqiang Jing , Zhehui Huang , Xiaoyang Wang , Wenlin Yao , Wenhao Yu , Kaixin Ma , Hongming Zhang , Xinya Du , Dong Yu

We introduce DSCodeBench, a new benchmark designed to evaluate large language models (LLMs) on complicated and realistic data science code generation tasks. DSCodeBench consists of 1,000 carefully constructed problems sourced from realistic…

Software Engineering · Computer Science 2025-11-18 Shuyin Ouyang , Dong Huang , Jingwen Guo , Zeyu Sun , Qihao Zhu , Jie M. Zhang

Large Language Models (LLMs) show promise as data analysis agents, but existing benchmarks overlook the iterative nature of the field, where experts' decisions evolve with deeper insights of the dataset. To address this, we introduce…

Computation and Language · Computer Science 2025-06-09 Hanyu Li , Haoyu Liu , Tingyu Zhu , Tianyu Guo , Zeyu Zheng , Xiaotie Deng , Michael I. Jordan

Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code…

Software Engineering · Computer Science 2024-08-12 Kechi Zhang , Jia Li , Ge Li , Xianjie Shi , Zhi Jin

Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks, yet they face significant limitations in handling complex, long-context programming challenges and demonstrating complex compositional reasoning…

Artificial Intelligence · Computer Science 2025-01-14 Amr Almorsi , Mohanned Ahmed , Walid Gomaa

Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with…

Computation and Language · Computer Science 2024-11-14 Jierui Li , Hung Le , Yingbo Zhou , Caiming Xiong , Silvio Savarese , Doyen Sahoo

This paper presents DataSciBench, a comprehensive benchmark for evaluating Large Language Model (LLM) capabilities in data science. Recent related benchmarks have primarily focused on single tasks, easily obtainable ground truth, and…

Computation and Language · Computer Science 2025-02-20 Dan Zhang , Sining Zhoubian , Min Cai , Fengzu Li , Lekang Yang , Wei Wang , Tianjiao Dong , Ziniu Hu , Jie Tang , Yisong Yue

Large language model (LLM) coding agents increasingly operate at the repository level, motivating benchmarks that evaluate their ability to optimize entire codebases under realistic constraints. Existing code benchmarks largely rely on…

Software Engineering · Computer Science 2026-05-18 Atharva Sehgal , James Hou , Akanksha Sarkar , Ishaan Mantripragada , Swarat Chaudhuri , Jennifer J. Sun , Yisong Yue

In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models.…

Machine Learning · Computer Science 2024-05-29 Siyuan Guo , Cheng Deng , Ying Wen , Hechang Chen , Yi Chang , Jun Wang

The rapid advancement of Large Language Models (LLMs) has driven novel applications across diverse domains, with LLM-based agents emerging as a crucial area of exploration. This survey presents a comprehensive analysis of LLM-based agents…

Artificial Intelligence · Computer Science 2025-11-25 Ke Chen , Peiran Wang , Yaoning Yu , Xianyang Zhan , Haohan Wang

Large Language Models (LLMs) are used for many tasks, including those related to coding. An important aspect of being able to utilize LLMs is the ability to assess their fitness for specific usages. The common practice is to evaluate LLMs…

Artificial Intelligence · Computer Science 2024-07-30 Marcel Zalmanovici , Orna Raz , Eitan Farchi , Iftach Freund

In this paper, we present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model designed to review code and identify potential issues. Our proposed LLM-based AI agent model is trained…

Autonomous science agents built on large language models (LLMs) are increasingly used to generate hypotheses, design experiments, and produce reports. However, prior work mainly targets open-ended scientific problems with subjective outputs…

Computation and Language · Computer Science 2026-03-24 Tianshu Zhang , Huan Sun

Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its…

Software Engineering · Computer Science 2025-11-27 Abhijeet Pathak , Suvadra Barua , Dinesh Gudimetla , Rupam Patir , Jiawei Guo , Hongxin Hu , Haipeng Cai

This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development,…

Software Engineering · Computer Science 2025-03-05 Liguo Chen , Qi Guo , Hongrui Jia , Zhengran Zeng , Xin Wang , Yijiang Xu , Jian Wu , Yidong Wang , Qing Gao , Jindong Wang , Wei Ye , Shikun Zhang

The use of large language models (LLMs) for automated code generation has emerged as a significant focus within AI research. As these pretrained models continue to evolve, their ability to understand and generate complex code structures has…

Software Engineering · Computer Science 2025-05-06 Nazmus Ashrafi , Salah Bouktif , Mohammed Mediani

With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation…

Software Engineering · Computer Science 2025-04-15 Haolin Jin , Linghan Huang , Haipeng Cai , Jun Yan , Bo Li , Huaming Chen

Users across enterprises increasingly rely on AI agents to query their data through natural language. However, building reliable data agents remains difficult because real-world data is often fragmented across multiple heterogeneous…

Autonomous coding agents built on large language models (LLMs) can now solve many general software and machine learning tasks, but they remain ineffective on complex, domain-specific scientific problems. Medical imaging is a particularly…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Roshan Kenia , Xiaoman Zhang , Pranav Rajpurkar
‹ Prev 1 2 3 10 Next ›