Related papers: Multi-Objective Agentic Rewrites for Unstructured …

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of…

Databases · Computer Science 2025-04-03 Shreya Shankar , Tristan Chambers , Tarak Shah , Aditya G. Parameswaran , Eugene Wu

Steering Semantic Data Processing With DocWrangler

Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce,…

Human-Computer Interaction · Computer Science 2025-04-22 Shreya Shankar , Bhavya Chopra , Mawil Hasan , Stephen Lee , Björn Hartmann , Joseph M. Hellerstein , Aditya G. Parameswaran , Eugene Wu

Leveraging the Power of Large Language Models in Entity Linking via Adaptive Routing and Targeted Reasoning

Entity Linking (EL) has traditionally relied on large annotated datasets and extensive model fine-tuning. While recent few-shot methods leverage large language models (LLMs) through prompting to reduce training requirements, they often…

Computation and Language · Computer Science 2025-11-20 Yajie Li , Albert Galimov , Mitra Datta Ganapaneni , Pujitha Thejaswi , De Meng , Priyanshu Kumar , Saloni Potdar

DocAtlas: Multilingual Document Understanding Across 80+ Languages

Multilingual document understanding remains limited for low-resource languages due to scarce training data and model-based annotation pipelines that perpetuate existing biases. We introduce DocAtlas, a framework that constructs…

Computation and Language · Computer Science 2026-05-22 Ahmed Heakl , Youssef Mohamed , Abdullah Sohail , Rania Elbadry , Ahmed Nassar , Peter W. J. Staar , Fahad Shahbaz Khan , Imran Razzak , Salman Khan

A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning

With the rapid development of large language models in recent years, there has been an increasing demand for domain-specific Agents that can cater to the unique needs of enterprises and organizations. Unlike general models, which strive for…

Computation and Language · Computer Science 2024-08-13 Chih-Wei Song , Yu-Kai Lee , Yin-Te Tsai

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Machine Learning · Computer Science 2025-12-19 Hao Liang , Xiaochen Ma , Zhou Liu , Zhen Hao Wong , Zhengyang Zhao , Zimo Meng , Runming He , Chengyu Shen , Qifeng Cai , Zhaoyang Han , Meiyi Qiang , Yalin Feng , Tianyi Bai , Zewei Pan , Ziyi Guo , Yizhen Jiang , Jingwen Deng , Qijie You , Peichao Lai , Tianyu Guo , Chi Hsu Tsai , Hengyi Feng , Rui Hu , Wenkai Yu , Junbo Niu , Bohan Zeng , Ruichuan An , Lu Ma , Jihao Huang , Yaowei Zheng , Conghui He , Linpeng Tang , Bin Cui , Weinan E , Wentao Zhang

An Auditable Agent Platform For Automated Molecular Optimisation

Drug discovery frequently loses momentum when data, expertise, and tools are scattered, slowing design cycles. To shorten this loop we built a hierarchical, tool using agent framework that automates molecular optimisation. A Principal…

Machine Learning · Computer Science 2025-08-06 Atabey Ünlü , Phil Rohr , Ahmet Celebi

DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery

Large language models (LLMs) integrated with autonomous agents hold significant potential for advancing scientific discovery through automated reasoning and task execution. However, applying LLM agents to drug discovery is still constrained…

Artificial Intelligence · Computer Science 2025-07-29 Kun Li , Zhennan Wu , Shoupeng Wang , Jia Wu , Shirui Pan , Wenbin Hu

DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation

Large Language Models (LLMs) have transformed listwise document reranking by enabling global reasoning over candidate sets, yet single models often struggle to balance fine-grained relevance scoring with holistic cross-document analysis. We…

Computation and Language · Computer Science 2025-08-26 Abdelrahman Abdallah , Jamshid Mozafari , Bhawna Piryani , Adam Jatowt

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Structured LLM workflows, where specialized LLM sub-agents execute according to a predefined graph, have become a powerful abstraction for solving complex tasks. Optimizing such workflows, i.e., selecting configurations for each sub-agent…

Computation and Language · Computer Science 2026-05-14 Junyan Li , Zhang-Wei Hong , Maohao Shen , Yang Zhang , Chuang Gan

RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting

Despite the widespread use of LLMs due to their superior performance in various tasks, their high computational costs often lead potential users to opt for the pretraining-finetuning pipeline. However, biases prevalent in manually…

Computation and Language · Computer Science 2024-12-20 Shuo Yang , Bardh Prenkaj , Gjergji Kasneci

AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation

LLMs have advanced text-to-SQL generation, yet monolithic architectures struggle with complex reasoning and schema diversity. We propose AGENTIQL, an agent-inspired multi-expert framework that combines a reasoning agent for question…

Computation and Language · Computer Science 2025-10-15 Omid Reza Heidari , Siobhan Reid , Yassine Yaakoubi

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information…

Artificial Intelligence · Computer Science 2025-10-09 Jingbo Yang , Bairu Hou , Wei Wei , Shiyu Chang , Yujia Bao

CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization

Optimizing Register Transfer Level (RTL) code is a critical step in Electronic Design Automation (EDA) for improving power, performance, and area (PPA). We present CODMAS (Collaborative Optimization via a Dialectic Multi-Agent System), a…

Computation and Language · Computer Science 2026-03-19 Che-Ming Chang , Prashanth Vijayaraghavan , Ashutosh Jadhav , Charles Mackin , Vandana Mukherjee , Hsinyu Tsai , Ehsan Degan

ADO: Automatic Data Optimization for Inputs in LLM Prompts

This study explores a novel approach to enhance the performance of Large Language Models (LLMs) through the optimization of input data within prompts. While previous research has primarily focused on refining instruction components and…

Machine Learning · Computer Science 2025-02-18 Sam Lin , Wenyue Hua , Lingyao Li , Zhenting Wang , Yongfeng Zhang

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

Agents based on large language models (LLMs) have demonstrated effectiveness in solving a wide range of tasks by integrating LLMs with key modules such as planning, memory, and tool usage. Increasingly, customers are adopting LLM agents…

Artificial Intelligence · Computer Science 2024-04-30 Zhenlan Ji , Daoyuan Wu , Pingchuan Ma , Zongjie Li , Shuai Wang

LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization

Databricks job orchestration systems (e.g., LeJOT) reduce cloud costs by selecting low-priced compute configurations while meeting latency and dependency constraints. Accurate execution-time prediction under heterogeneous instance types and…

Machine Learning · Computer Science 2026-03-10 Lizhi Ma , Yi-Xiang Hu , Yihui Ren , Feng Wu , Xiang-Yang Li

SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines…

Machine Learning · Computer Science 2022-04-21 Ripon K. Saha , Akira Ura , Sonal Mahajan , Chenguang Zhu , Linyi Li , Yang Hu , Hiroaki Yoshida , Sarfraz Khurshid , Mukul R. Prasad

CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models

While LLMs have been extensively studied on general text generation tasks, there is less research on text rewriting, a task related to general text generation, and particularly on the behavior of models on this task. In this paper we…

Computation and Language · Computer Science 2025-09-19 Thomas Huber , Christina Niklaus

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic…

Databases · Computer Science 2026-02-04 Matthew Russo , Chunwei Liu , Sivaprasad Sudhir , Gerardo Vitagliano , Michael Cafarella , Tim Kraska , Samuel Madden