Related papers: Steering Semantic Data Processing With DocWrangler

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of…

Databases · Computer Science 2025-04-03 Shreya Shankar , Tristan Chambers , Tarak Shah , Aditya G. Parameswaran , Eugene Wu

LLM/Agent-as-Data-Analyst: A Survey

Large language models (LLMs) and agent techniques have brought a fundamental shift in the functionality and development paradigm of data analysis tasks (a.k.a LLM/Agent-as-Data-Analyst), demonstrating substantial impact across both academia…

Artificial Intelligence · Computer Science 2025-10-28 Zirui Tang , Weizheng Wang , Zihang Zhou , Yang Jiao , Bangrui Xu , Boyu Niu , Dayou Zhou , Xuanhe Zhou , Guoliang Li , Yeye He , Wei Zhou , Yitong Song , Cheng Tan , Xue Yang , Chunwei Liu , Bin Wang , Conghui He , Xiaoyang Wang , Fan Wu

Semantic Data Processing with Holistic Data Understanding

Semantic operators have increasingly become integrated within data systems to enable processing data using Large Language Models (LLMs). Despite significant recent effort in improving these operators, their accuracy is limited due to a…

Databases · Computer Science 2026-04-06 Youran Sun , Sepanta Zeighami , Bhavya Chopra , Shreya Shankar , Aditya G. Parameswaran

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex…

Human-Computer Interaction · Computer Science 2024-04-17 Syed Mekael Wasti , Ken Q. Pu , Ali Neshati

More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning

The reasoning capabilities of Large Language Models (LLMs) play a critical role in many downstream tasks, yet depend strongly on the quality of training data. Despite various proposed data construction methods, their practical utility in…

Computation and Language · Computer Science 2025-10-09 Yike Zhao , Simin Guo , Ziqing Yang , Shifan Han , Dahua Lin , Fei Tan

VectraFlow: Long-Horizon Semantic Processing over Data and Event Streams with LLMs

Monitoring continuous data for meaningful signals increasingly demands long-horizon, stateful reasoning over unstructured streams. However, today's LLM frameworks remain stateless and one-shot, and traditional Complex Event Processing (CEP)…

Databases · Computer Science 2026-04-07 Shu Chen , Junhan Liu , Deepti Raghavan , Ugur Cetintemel

Leveraging Large Language Models for Realizing Truly Intelligent User Interfaces

The number of published scholarly articles is growing at a significant rate, making scholarly knowledge organization increasingly important. Various approaches have been proposed to organize scholarly information, including describing…

Digital Libraries · Computer Science 2025-01-22 Allard Oelen , Sören Auer

Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems

Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For…

Databases · Computer Science 2025-07-03 Zhaoyan Sun , Jiayi Wang , Xinyang Zhao , Jiachi Wang , Guoliang Li

Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models

People are increasingly turning to large language models (LLMs) for complex information tasks like academic research or planning a move to another city. However, while they often require working in a nonlinear manner -- e.g., to arrange…

Human-Computer Interaction · Computer Science 2023-08-31 Sangho Suh , Bryan Min , Srishti Palani , Haijun Xia

Agentic AI: The Era of Semantic Decoding

Recent work demonstrated great promise in the idea of orchestrating collaborations between LLMs, human input, and various tools to address the inherent limitations of LLMs. We propose a novel perspective called semantic decoding, which…

Computation and Language · Computer Science 2025-04-30 Maxime Peyrard , Martin Josifoski , Robert West

SemPipes -- Optimizable Semantic Data Operators for Tabular Machine Learning Pipelines

Real-world machine learning on tabular data relies on complex data preparation pipelines for prediction, data integration, augmentation, and debugging. Designing these pipelines requires substantial domain expertise and engineering effort,…

Machine Learning · Computer Science 2026-02-06 Olga Ovcharenko , Matthias Boehm , Sebastian Schelter

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…

Computation and Language · Computer Science 2024-02-20 YiQiu Guo , Yuchen Yang , Ya Zhang , Yu Wang , Yanfeng Wang

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks

Large language models (LLMs) have shown to be valuable tools for tackling process mining tasks. Existing studies report on their capability to support various data-driven process analyses and even, to some extent, that they are able to…

Databases · Computer Science 2025-05-01 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

StreamLink: Large-Language-Model Driven Distributed Data Engineering System

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the…

Databases · Computer Science 2025-05-29 Dawei Feng , Di Mei , Huiri Tan , Lei Ren , Xianying Lou , Zhangxi Tan

Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms

With the increasing complexity and rapid expansion of the scale of AI systems in cloud platforms, the log data generated during system operation is massive, unstructured, and semantically ambiguous, which brings great challenges to fault…

Artificial Intelligence · Computer Science 2025-06-24 Cheng Ji , Huaiying Luo

Flowco: Rethinking Data Analysis in the Age of LLMs

Conducting data analysis typically involves authoring code to transform, visualize, analyze, and interpret data. Large language models (LLMs) are now capable of generating such code for simple, routine analyses. LLMs promise to democratize…

Human-Computer Interaction · Computer Science 2025-04-22 Stephen N. Freund , Brooke Simon , Emery D. Berger , Eunice Jun

MicLog: Towards Accurate and Efficient LLM-based Log Parsing via Progressive Meta In-Context Learning

Log parsing converts semi-structured logs into structured templates, forming a critical foundation for downstream analysis. Traditional syntax and semantic-based parsers often struggle with semantic variations in evolving logs and data…

Software Engineering · Computer Science 2026-01-13 Jianbo Yu , Yixuan Li , Hai Xu , Kang Xu , Junjielong Xu , Zhijing Li , Pinjia He , Wanyuan Wang

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Machine Learning · Computer Science 2025-12-19 Hao Liang , Xiaochen Ma , Zhou Liu , Zhen Hao Wong , Zhengyang Zhao , Zimo Meng , Runming He , Chengyu Shen , Qifeng Cai , Zhaoyang Han , Meiyi Qiang , Yalin Feng , Tianyi Bai , Zewei Pan , Ziyi Guo , Yizhen Jiang , Jingwen Deng , Qijie You , Peichao Lai , Tianyu Guo , Chi Hsu Tsai , Hengyi Feng , Rui Hu , Wenkai Yu , Junbo Niu , Bohan Zeng , Ruichuan An , Lu Ma , Jihao Huang , Yaowei Zheng , Conghui He , Linpeng Tang , Bin Cui , Weinan E , Wentao Zhang