English
Related papers

Related papers: Steering Semantic Data Processing With DocWrangler

200 papers

Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of…

Databases · Computer Science 2025-04-03 Shreya Shankar , Tristan Chambers , Tarak Shah , Aditya G. Parameswaran , Eugene Wu

Large language models (LLMs) and agent techniques have brought a fundamental shift in the functionality and development paradigm of data analysis tasks (a.k.a LLM/Agent-as-Data-Analyst), demonstrating substantial impact across both academia…

Semantic operators have increasingly become integrated within data systems to enable processing data using Large Language Models (LLMs). Despite significant recent effort in improving these operators, their accuracy is limited due to a…

Databases · Computer Science 2026-04-06 Youran Sun , Sepanta Zeighami , Bhavya Chopra , Shreya Shankar , Aditya G. Parameswaran

The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex…

Human-Computer Interaction · Computer Science 2024-04-17 Syed Mekael Wasti , Ken Q. Pu , Ali Neshati

The reasoning capabilities of Large Language Models (LLMs) play a critical role in many downstream tasks, yet depend strongly on the quality of training data. Despite various proposed data construction methods, their practical utility in…

Computation and Language · Computer Science 2025-10-09 Yike Zhao , Simin Guo , Ziqing Yang , Shifan Han , Dahua Lin , Fei Tan

Monitoring continuous data for meaningful signals increasingly demands long-horizon, stateful reasoning over unstructured streams. However, today's LLM frameworks remain stateless and one-shot, and traditional Complex Event Processing (CEP)…

Databases · Computer Science 2026-04-07 Shu Chen , Junhan Liu , Deepti Raghavan , Ugur Cetintemel

The number of published scholarly articles is growing at a significant rate, making scholarly knowledge organization increasingly important. Various approaches have been proposed to organize scholarly information, including describing…

Digital Libraries · Computer Science 2025-01-22 Allard Oelen , Sören Auer

Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For…

Databases · Computer Science 2025-07-03 Zhaoyan Sun , Jiayi Wang , Xinyang Zhao , Jiachi Wang , Guoliang Li

People are increasingly turning to large language models (LLMs) for complex information tasks like academic research or planning a move to another city. However, while they often require working in a nonlinear manner -- e.g., to arrange…

Human-Computer Interaction · Computer Science 2023-08-31 Sangho Suh , Bryan Min , Srishti Palani , Haijun Xia

Recent work demonstrated great promise in the idea of orchestrating collaborations between LLMs, human input, and various tools to address the inherent limitations of LLMs. We propose a novel perspective called semantic decoding, which…

Computation and Language · Computer Science 2025-04-30 Maxime Peyrard , Martin Josifoski , Robert West

Real-world machine learning on tabular data relies on complex data preparation pipelines for prediction, data integration, augmentation, and debugging. Designing these pipelines requires substantial domain expertise and engineering effort,…

Machine Learning · Computer Science 2026-02-06 Olga Ovcharenko , Matthias Boehm , Sebastian Schelter

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…

Computation and Language · Computer Science 2024-02-20 YiQiu Guo , Yuchen Yang , Ya Zhang , Yu Wang , Yanfeng Wang

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

Large language models (LLMs) have shown to be valuable tools for tackling process mining tasks. Existing studies report on their capability to support various data-driven process analyses and even, to some extent, that they are able to…

Databases · Computer Science 2025-05-01 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the…

Databases · Computer Science 2025-05-29 Dawei Feng , Di Mei , Huiri Tan , Lei Ren , Xianying Lou , Zhangxi Tan

With the increasing complexity and rapid expansion of the scale of AI systems in cloud platforms, the log data generated during system operation is massive, unstructured, and semantically ambiguous, which brings great challenges to fault…

Artificial Intelligence · Computer Science 2025-06-24 Cheng Ji , Huaiying Luo

Conducting data analysis typically involves authoring code to transform, visualize, analyze, and interpret data. Large language models (LLMs) are now capable of generating such code for simple, routine analyses. LLMs promise to democratize…

Human-Computer Interaction · Computer Science 2025-04-22 Stephen N. Freund , Brooke Simon , Emery D. Berger , Eunice Jun

Log parsing converts semi-structured logs into structured templates, forming a critical foundation for downstream analysis. Traditional syntax and semantic-based parsers often struggle with semantic variations in evolving logs and data…

Software Engineering · Computer Science 2026-01-13 Jianbo Yu , Yixuan Li , Hai Xu , Kang Xu , Junjielong Xu , Zhijing Li , Pinjia He , Wanyuan Wang

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

‹ Prev 1 2 3 10 Next ›