Related papers: Dataforge: Agentic Platform for Autonomous Data En…

DeepPrep: An LLM-Powered Agentic System for Autonomous Data Preparation

Data preparation, which aims to transform heterogeneous and noisy raw tables into analysis-ready data, remains a major bottleneck in data science. Recent approaches leverage large language models (LLMs) to automate data preparation from…

Databases · Computer Science 2026-02-10 Meihao Fan , Ju Fan , Yuxin Zhang , Shaolei Zhang , Xiaoyong Du , Jie Song , Peng Li , Fuxin Jiang , Tieying Zhang , Jianjun Chen

Autonomous Data Agents: A New Opportunity for Smart Data

As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between…

Artificial Intelligence · Computer Science 2025-10-07 Yanjie Fu , Dongjie Wang , Wangyang Ying , Xinyuan Wang , Xiangliang Zhang , Huan Liu , Jian Pei

Can Agentic AI Match the Performance of Human Data Scientists?

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental…

Machine Learning · Computer Science 2025-12-25 An Luo , Jin Du , Fangqiao Tian , Xun Xian , Robert Specht , Ganghua Wang , Xuan Bi , Charles Fleming , Jayanth Srinivasa , Ashish Kundu , Mingyi Hong , Jie Ding

A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge

The emergence of LLMs has catalyzed a paradigm shift in autonomous agent development, enabling systems capable of reasoning, planning, and executing complex multi-step tasks. However, existing agent frameworks often suffer from…

Artificial Intelligence · Computer Science 2026-01-21 Akbar Anbar Jafari , Cagri Ozcinar , Gholamreza Anbarjafari

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data…

Artificial Intelligence · Computer Science 2025-10-21 Shaolei Zhang , Ju Fan , Meihao Fan , Guoliang Li , Xiaoyong Du

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data…

Artificial Intelligence · Computer Science 2024-11-07 Ziming Li , Qianbo Zang , David Ma , Jiawei Guo , Tuney Zheng , Minghao Liu , Xinyao Niu , Yue Wang , Jian Yang , Jiaheng Liu , Wanjun Zhong , Wangchunshu Zhou , Wenhao Huang , Ge Zhang

DataMaster: Data-Centric Autonomous AI Research

As model families, training recipes, and compute budgets become increasingly standardized, further gains in machine learning systems depend increasingly on data. Yet data engineering remains largely manual and ad hoc: practitioners…

Machine Learning · Computer Science 2026-05-14 Yaxin Du , Xiyuan Yang , Zhifan Zhou , Wanxu Liu , Zixing Lei , Zimeng Chen , Fenyi Liu , Haotian Wu , Yuzhu Cai , Zexi Liu , Xinyu Zhu , WenHao Wang , Linfeng Zhang , Chen Qian , Siheng Chen

AgenticData: An Agentic Data Analytics System for Heterogeneous Data

Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic…

Databases · Computer Science 2025-08-08 Ji Sun , Guoliang Li , Peiyao Zhou , Yihui Ma , Jingzhe Xu , Yuan Li

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

KForge: Program Synthesis for Diverse AI Hardware Accelerators

GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a generation agent that produces and…

Machine Learning · Computer Science 2025-11-18 Taras Sereda , Tom St. John , Burak Bartan , Natalie Serrino , Sachin Katti , Zain Asgar

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within…

Machine Learning · Computer Science 2026-05-12 Nikhil Abhyankar , Parshin Shojaee , Chandan K. Reddy

AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex…

Machine Learning · Computer Science 2025-04-03 Francisco Erivaldo Fernandes Junior , Antti Oulasvirta

Towards Agentic Intelligence for Materials Science

The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan,…

Materials Science · Physics 2026-02-09 Huan Zhang , Yizhan Li , Wenhao Huang , Ziyu Hou , Yu Song , Xuye Liu , Farshid Effaty , Jinya Jiang , Sifan Wu , Qianggang Ding , Izumi Takahara , Leonard R. MacGillivray , Teruyasu Mizoguchi , Tianshu Yu , Lizi Liao , Yuyu Luo , Yu Rong , Jia Li , Ying Diao , Heng Ji , Bang Liu

ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows

Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often…

Artificial Intelligence · Computer Science 2026-05-13 Wei Liu , Yang Gu , Xi Yan , Zihan Nan , Beicheng Xu , Keyao Ding , Bin Cui , Wentao Zhang

TestForge: Feedback-Driven, Agentic Test Suite Generation

Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice.…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Claire Le Goues

TAGAL: Tabular Data Generation using Agentic LLM Methods

The generation of data is a common approach to improve the performance of machine learning tasks, among which is the training of models for classification. In this paper, we present TAGAL, a collection of methods able to generate synthetic…

Machine Learning · Computer Science 2025-09-05 Benoît Ronval , Pierre Dupont , Siegfried Nijssen

TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow

The high cost of agentic workflows in formal mathematics hinders large-scale data synthesis, exacerbating the scarcity of open-source corpora. To address this, we introduce \textbf{TheoremForge}, a cost-effective formal data synthesis…

Artificial Intelligence · Computer Science 2026-01-27 Yicheng Tao , Hongteng Xu

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code…

Software Engineering · Computer Science 2026-04-16 Rajesh Kumar , Waqar Ali , Junaid Ahmed , Najma Imtiaz Ali , Shaban Usman

Towards Next-Generation LLM Training: From the Data-Centric Perspective

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks and domains, with data playing a central role in enabling these advances. Despite this success, the preparation and effective utilization of…

Computation and Language · Computer Science 2026-03-17 Hao Liang , Zhengyang Zhao , Zhaoyang Han , Meiyi Qiang , Xiaochen Ma , Bohan Zeng , Qifeng Cai , Zhiyu Li , Linpeng Tang , Weinan E , Wentao Zhang

Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics

The rapid advancement of LLMs has led to the creation of diverse agentic systems in data analysis, utilizing LLMs' capabilities to improve insight generation and visualization. In this paper, we present an agentic system that automates the…

Artificial Intelligence · Computer Science 2025-05-30 Ran Zhang , Mohannad Elhamod