Related papers: Semantic Data Processing with Holistic Data Unders…

Steering Semantic Data Processing With DocWrangler

Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce,…

Human-Computer Interaction · Computer Science 2025-04-22 Shreya Shankar , Bhavya Chopra , Mawil Hasan , Stephen Lee , Björn Hartmann , Joseph M. Hellerstein , Aditya G. Parameswaran , Eugene Wu

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data

The rapid increase in textual information means we need more efficient methods to sift through, organize, and understand it all. While retrieval-augmented generation (RAG) models excel in accessing information from large document…

Computation and Language · Computer Science 2025-03-14 Seiji Maekawa , Hayate Iso , Nikita Bhutani

Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs

Scaling test-time computation--generating and analyzing multiple or sequential outputs for a single input--has become a promising strategy for improving the reliability and quality of large language models (LLMs), as evidenced by advances…

Computation and Language · Computer Science 2025-06-03 Sungjae Lee , Hoyoung Kim , Jeongyeon Hwang , Eunhyeok Park , Jungseul Ok

SemanticOpt: Towards LLM-Based Semantic Black-Box Optimization

Optimizing an experimental system can be extremely challenging when each experiment is expensive, time-consuming, or difficult to perform. Existing optimizers for expensive black-box problems, such as Bayesian optimization, are typically…

Machine Learning · Computer Science 2026-05-18 Jamison Meindl , Yunsheng Tian , Tony Cui , Veronika Thost , Zhang-Wei Hong , Jie Chen , Wojciech Matusik , Mina Konaković Luković

Implementing Semantic Join Operators Efficiently

Semantic query processing engines often support semantic joins, enabling users to match rows that satisfy conditions specified in natural language. Such join conditions can be evaluated using large language models (LLMs) that solve novel…

Databases · Computer Science 2025-10-10 Immanuel Trummer

OmniLLP: Enhancing LLM-based Log Level Prediction with Context-Aware Retrieval

Developers insert logging statements in source code to capture relevant runtime information essential for maintenance and debugging activities. Log level choice is an integral, yet tricky part of the logging activity as it controls log…

Software Engineering · Computer Science 2025-08-13 Youssef Esseddiq Ouatiti , Mohammed Sayagh , Bram Adams , Ahmed E. Hassan

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity…

Computation and Language · Computer Science 2024-02-20 YiQiu Guo , Yuchen Yang , Ya Zhang , Yu Wang , Yanfeng Wang

WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale

Large language models (LLMs) with extended context windows enable tasks requiring extensive information integration but are limited by the scarcity of high-quality, diverse datasets for long-context instruction tuning. Existing data…

Computation and Language · Computer Science 2025-02-25 Jiaxi Li , Xingxing Zhang , Xun Wang , Xiaolong Huang , Li Dong , Liang Wang , Si-Qing Chen , Wei Lu , Furu Wei

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic…

Databases · Computer Science 2026-02-04 Matthew Russo , Chunwei Liu , Sivaprasad Sudhir , Gerardo Vitagliano , Michael Cafarella , Tim Kraska , Samuel Madden

Semantic Operators: A Declarative Model for Rich, AI-based Data Processing

The semantic capabilities of large language models (LLMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems either empirically optimize expensive LLM-powered operations…

Databases · Computer Science 2025-03-04 Liana Patel , Siddharth Jha , Melissa Pan , Harshit Gupta , Parth Asawa , Carlos Guestrin , Matei Zaharia

Agentic AI: The Era of Semantic Decoding

Recent work demonstrated great promise in the idea of orchestrating collaborations between LLMs, human input, and various tools to address the inherent limitations of LLMs. We propose a novel perspective called semantic decoding, which…

Computation and Language · Computer Science 2025-04-30 Maxime Peyrard , Martin Josifoski , Robert West

Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics

With advances in large language models (LLMs), researchers are creating new systems that can perform AI-driven analytics over large unstructured datasets. Recent work has explored executing such analytics queries using semantic operators --…

Artificial Intelligence · Computer Science 2025-09-04 Matthew Russo , Tim Kraska

SCOPE: Tree-based Self-Correcting Online Log Parsing via Syntactic-Semantic Collaboration

Log parsing is a critical step for automated log analysis in complex systems. Traditional heuristic-based methods offer high efficiency but are limited in accuracy due to overlooking semantic context. In contrast, recent LLM-based parsers…

Computation and Language · Computer Science 2026-03-31 Dongyi Fan , Suqiong Zhang , Lili He , Ming Liu , Yifan Huo

Evaluating SQL Understanding in Large Language Models

The rise of large language models (LLMs) has significantly impacted various domains, including natural language processing (NLP) and image generation, by making complex computational tasks more accessible. While LLMs demonstrate impressive…

Databases · Computer Science 2024-10-15 Ananya Rahaman , Anny Zheng , Mostafa Milani , Fei Chiang , Rachel Pottinger

StringLLM: Understanding the String Processing Capability of Large Language Models

String processing, which mainly involves the analysis and manipulation of strings, is a fundamental component of modern computing. Despite the significant advancements of large language models (LLMs) in various natural language processing…

Computation and Language · Computer Science 2025-01-28 Xilong Wang , Hao Fu , Jindong Wang , Neil Zhenqiang Gong

Towards Semantically Enhanced Data Understanding

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…

Databases · Computer Science 2018-06-14 Markus Schröder , Christian Jilek , Jörn Hees , Andreas Dengel

Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms

With the increasing complexity and rapid expansion of the scale of AI systems in cloud platforms, the log data generated during system operation is massive, unstructured, and semantically ambiguous, which brings great challenges to fault…

Artificial Intelligence · Computer Science 2025-06-24 Cheng Ji , Huaiying Luo

Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation

Large Language Models (LLMs) are revolutionizing how users interact with information systems, yet their high inference cost poses serious scalability and sustainability challenges. Caching inference responses, allowing them to be retrieved…

Machine Learning · Computer Science 2026-02-16 Xutong Liu , Baran Atalar , Xiangxiang Dai , Jinhang Zuo , Siwei Wang , John C. S. Lui , Wei Chen , Carlee Joe-Wong

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some…

Computation and Language · Computer Science 2024-07-03 Adrian Rebmann , Fabian David Schmidt , Goran Glavaš , Han van der Aa

Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts

We introduce HAMLET, a holistic and automated framework for evaluating the long-context comprehension of large language models (LLMs). HAMLET structures source texts into a three-level key-fact hierarchy at root-, branch-, and leaf-levels,…

Computation and Language · Computer Science 2025-08-28 Jiaqi Deng , Yuho Lee , Nicole Hee-Yeon Kim , Hyangsuk Min , Taewon Yun , Minjeong Ban , Kim Yul , Hwanjun Song