Related papers: DocETL: Agentic Query Rewriting and Evaluation for…

Multi-Objective Agentic Rewrites for Unstructured Data Processing

One year ago, we open-sourced DocETL, a declarative system for LLM-powered data processing that, as of March 2026, has 3.7K GitHub stars and users across domains (e.g., journalism, law, medicine, policy, finance, and urban planning). In…

Databases · Computer Science 2026-04-03 Lindsey Linxi Wei , Shreya Shankar , Sepanta Zeighami , Yeounoh Chung , Fatma Ozcan , Aditya G. Parameswaran

Steering Semantic Data Processing With DocWrangler

Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce,…

Human-Computer Interaction · Computer Science 2025-04-22 Shreya Shankar , Bhavya Chopra , Mawil Hasan , Stephen Lee , Björn Hartmann , Joseph M. Hellerstein , Aditya G. Parameswaran , Eugene Wu

DOCUEVAL: An LLM-based AI Engineering Tool for Building Customisable Document Evaluation Workflows

Foundation models, such as large language models (LLMs), have the potential to streamline evaluation workflows and improve their performance. However, practical adoption faces challenges, such as customisability, accuracy, and scalability.…

Information Retrieval · Computer Science 2025-11-11 Hao Zhang , Qinghua Lu , Liming Zhu

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle…

Computation and Language · Computer Science 2025-07-08 Ziyang Miao , Qiyu Sun , Jingyuan Wang , Yuchen Gong , Yaowei Zheng , Shiqi Li , Richong Zhang

QUEST: Query Optimization in Unstructured Document Analysis

Most recently, researchers have started building large language models (LLMs) powered data systems that allow users to analyze unstructured text documents like working with a database because LLMs are very effective in extracting attributes…

Databases · Computer Science 2025-07-14 Zhaoze Sun , Qiyan Deng , Chengliang Chai , Kaisen Jin , Xinyu Guo , Han Han , Ye Yuan , Guoren Wang , Lei Cao

DataPuzzle: Breaking Free from the Hallucinated Promise of LLMs in Data Analysis

Large language models (LLMs) are increasingly applied to multi-modal data analysis -- not necessarily because they offer the most precise answers, but because they provide fluent, flexible interfaces for interpreting complex inputs. Yet…

Computation and Language · Computer Science 2025-09-30 Zhengxuan Zhang , Zhuowen Liang , Yin Wu , Teng Lin , Yuyu Luo , Nan Tang

AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation

LLMs have advanced text-to-SQL generation, yet monolithic architectures struggle with complex reasoning and schema diversity. We propose AGENTIQL, an agent-inspired multi-expert framework that combines a reasoning agent for question…

Computation and Language · Computer Science 2025-10-15 Omid Reza Heidari , Siobhan Reid , Yassine Yaakoubi

StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text

The effective utilization of structured data, integral to corporate data strategies, has been challenged by the rise of large language models (LLMs) capable of processing unstructured information. This shift prompts the question: can LLMs…

Computation and Language · Computer Science 2024-10-22 Zhouhong Gu , Haoning Ye , Xingzhou Chen , Zeyang Zhou , Hongwei Feng , Yanghua Xiao

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execution, and structured memory to address…

Artificial Intelligence · Computer Science 2025-07-16 Junde Wu , Jiayuan Zhu , Yuyuan Liu , Min Xu , Yueming Jin

LLM/Agent-as-Data-Analyst: A Survey

Large language models (LLMs) and agent techniques have brought a fundamental shift in the functionality and development paradigm of data analysis tasks (a.k.a LLM/Agent-as-Data-Analyst), demonstrating substantial impact across both academia…

Artificial Intelligence · Computer Science 2025-10-28 Zirui Tang , Weizheng Wang , Zihang Zhou , Yang Jiao , Bangrui Xu , Boyu Niu , Dayou Zhou , Xuanhe Zhou , Guoliang Li , Yeye He , Wei Zhou , Yitong Song , Cheng Tan , Xue Yang , Chunwei Liu , Bin Wang , Conghui He , Xiaoyang Wang , Fan Wu

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up…

Machine Learning · Computer Science 2025-06-09 Patara Trirat , Wonyong Jeong , Sung Ju Hwang

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

Agents based on large language models (LLMs) have demonstrated effectiveness in solving a wide range of tasks by integrating LLMs with key modules such as planning, memory, and tool usage. Increasingly, customers are adopting LLM agents…

Artificial Intelligence · Computer Science 2024-04-30 Zhenlan Ji , Daoyuan Wu , Pingchuan Ma , Zongjie Li , Shuai Wang

Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems

Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For…

Databases · Computer Science 2025-07-03 Zhaoyan Sun , Jiayi Wang , Xinyang Zhao , Jiachi Wang , Guoliang Li

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Machine Learning · Computer Science 2025-12-19 Hao Liang , Xiaochen Ma , Zhou Liu , Zhen Hao Wong , Zhengyang Zhao , Zimo Meng , Runming He , Chengyu Shen , Qifeng Cai , Zhaoyang Han , Meiyi Qiang , Yalin Feng , Tianyi Bai , Zewei Pan , Ziyi Guo , Yizhen Jiang , Jingwen Deng , Qijie You , Peichao Lai , Tianyu Guo , Chi Hsu Tsai , Hengyi Feng , Rui Hu , Wenkai Yu , Junbo Niu , Bohan Zeng , Ruichuan An , Lu Ma , Jihao Huang , Yaowei Zheng , Conghui He , Linpeng Tang , Bin Cui , Weinan E , Wentao Zhang

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete,…

Software Engineering · Computer Science 2025-05-27 Dayu Yang , Antoine Simoulin , Xin Qian , Xiaoyi Liu , Yuwei Cao , Zhaopu Teng , Grey Yang

A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows

Building deployment-ready LLM agents requires complex orchestration of tools, data sources, and control flow logic, yet existing systems tightly couple agent logic to specific programming languages and deployment models. We present a…

Software Engineering · Computer Science 2025-12-24 Ivan Daunis

FlowETL: An Autonomous Example-Driven Pipeline for Data Engineering

The Extract, Transform, Load (ETL) workflow is fundamental for populating and maintaining data warehouses and other data stores accessed by analysts for downstream tasks. A major shortcoming of modern ETL solutions is the extensive need for…

Software Engineering · Computer Science 2025-08-01 Mattia Di Profio , Mingjun Zhong , Yaji Sripada , Marcel Jaspars

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

Recent advancements on Large Language Models (LLMs) enable AI Agents to automatically generate and execute multi-step plans to solve complex tasks. However, since LLM's content generation process is hardly controllable, current LLM-based…

Machine Learning · Computer Science 2024-08-13 Zelong Li , Wenyue Hua , Hao Wang , He Zhu , Yongfeng Zhang

MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models,…

Computation and Language · Computer Science 2024-04-16 Yu Li , Shenyu Zhang , Rui Wu , Xiutian Huang , Yongrui Chen , Wenhao Xu , Guilin Qi , Dehai Min

DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent Behaviors

Large language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur,…

Human-Computer Interaction · Computer Science 2026-02-06 Rui Sheng , Yukun Yang , Chuhan Shi , Yanna Lin , Zixin Chen , Huamin Qu , Furui Cheng