English
Related papers

Related papers: Dataforge: Agentic Platform for Autonomous Data En…

200 papers

Data preparation, which aims to transform heterogeneous and noisy raw tables into analysis-ready data, remains a major bottleneck in data science. Recent approaches leverage large language models (LLMs) to automate data preparation from…

Databases · Computer Science 2026-02-10 Meihao Fan , Ju Fan , Yuxin Zhang , Shaolei Zhang , Xiaoyong Du , Jie Song , Peng Li , Fuxin Jiang , Tieying Zhang , Jianjun Chen

As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between…

Artificial Intelligence · Computer Science 2025-10-07 Yanjie Fu , Dongjie Wang , Wangyang Ying , Xinyuan Wang , Xiangliang Zhang , Huan Liu , Jian Pei

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental…

The emergence of LLMs has catalyzed a paradigm shift in autonomous agent development, enabling systems capable of reasoning, planning, and executing complex multi-step tasks. However, existing agent frameworks often suffer from…

Artificial Intelligence · Computer Science 2026-01-21 Akbar Anbar Jafari , Cagri Ozcinar , Gholamreza Anbarjafari

Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data…

Artificial Intelligence · Computer Science 2025-10-21 Shaolei Zhang , Ju Fan , Meihao Fan , Guoliang Li , Xiaoyong Du

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data…

As model families, training recipes, and compute budgets become increasingly standardized, further gains in machine learning systems depend increasingly on data. Yet data engineering remains largely manual and ad hoc: practitioners…

Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic…

Databases · Computer Science 2025-08-08 Ji Sun , Guoliang Li , Peiyao Zhou , Yihui Ma , Jingzhe Xu , Yuan Li

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a generation agent that produces and…

Machine Learning · Computer Science 2025-11-18 Taras Sereda , Tom St. John , Burak Bartan , Natalie Serrino , Sachin Katti , Zain Asgar

Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within…

Machine Learning · Computer Science 2026-05-12 Nikhil Abhyankar , Parshin Shojaee , Chandan K. Reddy

Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex…

Machine Learning · Computer Science 2025-04-03 Francisco Erivaldo Fernandes Junior , Antti Oulasvirta

The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan,…

Table processing-including cleaning, transformation, augmentation, and matching-is a foundational yet error-prone stage in real-world data pipelines. While recent LLM-based approaches show promise for automating such tasks, they often…

Artificial Intelligence · Computer Science 2026-05-13 Wei Liu , Yang Gu , Xi Yan , Zihan Nan , Beicheng Xu , Keyao Ding , Bin Cui , Wentao Zhang

Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice.…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Claire Le Goues

The generation of data is a common approach to improve the performance of machine learning tasks, among which is the training of models for classification. In this paper, we present TAGAL, a collection of methods able to generate synthetic…

Machine Learning · Computer Science 2025-09-05 Benoît Ronval , Pierre Dupont , Siegfried Nijssen

The high cost of agentic workflows in formal mathematics hinders large-scale data synthesis, exacerbating the scarcity of open-source corpora. To address this, we introduce \textbf{TheoremForge}, a cost-effective formal data synthesis…

Artificial Intelligence · Computer Science 2026-01-27 Yicheng Tao , Hongteng Xu

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code…

Software Engineering · Computer Science 2026-04-16 Rajesh Kumar , Waqar Ali , Junaid Ahmed , Najma Imtiaz Ali , Shaban Usman

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks and domains, with data playing a central role in enabling these advances. Despite this success, the preparation and effective utilization of…

Computation and Language · Computer Science 2026-03-17 Hao Liang , Zhengyang Zhao , Zhaoyang Han , Meiyi Qiang , Xiaochen Ma , Bohan Zeng , Qifeng Cai , Zhiyu Li , Linpeng Tang , Weinan E , Wentao Zhang

The rapid advancement of LLMs has led to the creation of diverse agentic systems in data analysis, utilizing LLMs' capabilities to improve insight generation and visualization. In this paper, we present an agentic system that automates the…

Artificial Intelligence · Computer Science 2025-05-30 Ran Zhang , Mohannad Elhamod
‹ Prev 1 2 3 10 Next ›