Related papers: Streaming Model Cascades for Semantic SQL

Online Cascade Learning for Efficient Inference over Streams

Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose online cascade learning, the first…

Machine Learning · Computer Science 2024-06-19 Lunyiu Nie , Zhimin Ding , Erdong Hu , Christopher Jermaine , Swarat Chaudhuri

One SQL to Rule Them All

Real-time data analysis and management are increasingly critical for today`s businesses. SQL is the de facto lingua franca for these endeavors, yet support for robust streaming analysis and management with SQL remains limited. Many…

Databases · Computer Science 2019-05-30 Edmon Begoli , Tyler Akidau , Fabian Hueske , Julian Hyde , Kathryn Knight , Kenneth Knowles

Task Cascades for Efficient Unstructured Data Processing

Modern database systems allow users to query or process unstructured text or document columns using LLM-powered functions. Users can express an operation in natural language (e.g., "identify if this review mentions billing issues"), with…

Databases · Computer Science 2026-01-12 Shreya Shankar , Sepanta Zeighami , Aditya Parameswaran

Cortex AISQL: A Production SQL Engine for Unstructured Data

Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning,…

Databases · Computer Science 2026-04-08 Paweł Liskowski , Benjamin Han , Paritosh Aggarwal , Bowei Chen , Boxin Jiang , Nitish Jindal , Zihan Li , Aaron Lin , Kyle Schmaus , Jay Tayade , Weicheng Zhao , Anupam Datta , Nathan Wiegand , Dimitris Tsirogiannis

From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base

Dynamic streams from news feeds, social media, sensor networks, and financial markets challenge static RAG frameworks. Full-scale indices incur high memory costs; periodic rebuilds introduce latency that undermines data freshness; naive…

Information Retrieval · Computer Science 2025-08-11 Yuzhou Zhu

Semantic SQL -- Combining and optimizing semantic predicates in SQL

In recent years, the surge in unstructured data analysis, facilitated by advancements in Machine Learning (ML), has prompted diverse approaches for handling images, text documents, and videos. Analysts, leveraging ML models, can extract…

Databases · Computer Science 2024-04-08 Akash Mittal , Anshul Bheemreddy , Huili Tao

Scalable Relational Query Processing on Big Matrix Data

The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of…

Databases · Computer Science 2021-11-10 Yongyang Yu , Mingjie Tang , Walid G. Aref

SAGE: Semantic-Aware Shared Sampling for Efficient Diffusion

Diffusion models manifest evident benefits across diverse domains, yet their high sampling cost, requiring dozens of sequential model evaluations, remains a major limitation. Prior efforts mainly accelerate sampling via optimized solvers or…

Machine Learning · Computer Science 2025-09-22 Haoran Zhao , Tong Bai , Lei Huang , Xiaoyu Liang

On Efficient Approximate Queries over Machine Learning Models

The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human…

Databases · Computer Science 2022-11-18 Dujian Ding , Sihem Amer-Yahia , Laks VS Lakshmanan

CascadeServe: Unlocking Model Cascades for Inference Serving

Machine learning (ML) models are increasingly deployed to production, calling for efficient inference serving systems. Efficient inference serving is complicated by two challenges: (i) ML models incur high computational costs, and (ii) the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-21 Ferdi Kossmann , Ziniu Wu , Alex Turk , Nesime Tatbul , Lei Cao , Samuel Madden

SlimSeg: Slimmable Semantic Segmentation with Boundary Supervision

Accurate semantic segmentation models typically require significant computational resources, inhibiting their use in practical applications. Recent works rely on well-crafted lightweight models to achieve fast inference. However, these…

Computer Vision and Pattern Recognition · Computer Science 2023-02-20 Danna Xue , Fei Yang , Pei Wang , Luis Herranz , Jinqiu Sun , Yu Zhu , Yanning Zhang

PATSQL: Efficient Synthesis of SQL Queries from Example Tables with Quick Inference of Projected Columns

SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to…

Software Engineering · Computer Science 2021-08-16 Keita Takenouchi , Takashi Ishio , Joji Okada , Yuji Sakata

Fast Dual Simulation Processing of Graph Database Queries (Supplement)

Graph database query languages feature expressive, yet computationally expensive pattern matching capabilities. Answering optional query clauses in SPARQL for instance renders the query evaluation problem immediately Pspace-complete.…

Databases · Computer Science 2018-10-23 Stephan Mennicke , Jan-Christoph Kalo , Denis Nagel , Hermann Kroll , Wolf-Tilo Balke

Splitting Gaussian Process Regression for Streaming Data

Gaussian processes offer a flexible kernel method for regression. While Gaussian processes have many useful theoretical properties and have proven practically useful, they suffer from poor scaling in the number of observations. In…

Machine Learning · Statistics 2021-08-26 Nick Terry , Youngjun Choe

Semantic Agreement Enables Efficient Open-Ended LLM Cascades

Cascade systems route computational requests to smaller models when possible and defer to larger models only when necessary, offering a promising approach to balance cost and quality in LLM deployment. However, they face a fundamental…

Computation and Language · Computer Science 2025-10-29 Duncan Soiffer , Steven Kolawole , Virginia Smith

100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of…

Databases · Computer Science 2026-04-16 Yeounoh Chung , Rushabh Desai , Jian He , Yu Xiao , Thibaud Hottelier , Yves-Laurent Kom Samo , Pushkar Khadilkar , Xianshun Chen , Sam Idicula , Fatma Özcan , Alon Halevy , Yannis Papakonstantinou

A Framework of Sparse Online Learning and Its Applications

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high…

Machine Learning · Computer Science 2015-07-28 Dayong Wang , Pengcheng Wu , Peilin Zhao , Steven C. H. Hoi

Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation

Large Language Models (LLMs) are revolutionizing how users interact with information systems, yet their high inference cost poses serious scalability and sustainability challenges. Caching inference responses, allowing them to be retrieved…

Machine Learning · Computer Science 2026-02-16 Xutong Liu , Baran Atalar , Xiangxiang Dai , Jinhang Zuo , Siwei Wang , John C. S. Lui , Wei Chen , Carlee Joe-Wong

Cascade-Aware Training of Language Models

Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for…

Computation and Language · Computer Science 2024-06-04 Congchao Wang , Sean Augenstein , Keith Rush , Wittawat Jitkrittum , Harikrishna Narasimhan , Ankit Singh Rawat , Aditya Krishna Menon , Alec Go

Streaming Hypergraph Partitioning Algorithms on Limited Memory Environments

Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…

Data Structures and Algorithms · Computer Science 2021-03-10 Fatih Taşyaran , Berkay Demireller , Kamer Kaya , Bora Uçar