Related papers: On Efficient Approximate Queries over Machine Lear…

Approximate Selection with Guarantees using Proxies

Due to the falling costs of data acquisition and storage, researchers and industry analysts often want to find all instances of rare events in large datasets. For instance, scientists can cheaply capture thousands of hours of video, but are…

Databases · Computer Science 2022-01-05 Daniel Kang , Edward Gan , Peter Bailis , Tatsunori Hashimoto , Matei Zaharia

100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of…

Databases · Computer Science 2026-04-16 Yeounoh Chung , Rushabh Desai , Jian He , Yu Xiao , Thibaud Hottelier , Yves-Laurent Kom Samo , Pushkar Khadilkar , Xianshun Chen , Sam Idicula , Fatma Özcan , Alon Halevy , Yannis Papakonstantinou

Optimizing Machine Learning Inference Queries with Correlative Proxy Models

We consider accelerating machine learning (ML) inference queries on unstructured datasets. Expensive operators such as feature extractors and classifiers are deployed as user-defined functions(UDFs), which are not penetrable with classic…

Databases · Computer Science 2022-01-04 Zhihui Yang , Zuozhi Wang , Yicong Huang , Yao Lu , Chen Li , X. Sean Wang

Accelerating Approximate Aggregation Queries with Expensive Predicates

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are…

Databases · Computer Science 2021-08-16 Daniel Kang , John Guibas , Peter Bailis , Tatsunori Hashimoto , Yi Sun , Matei Zaharia

Personalized Top-k Set Queries Over Predicted Scores

This work studies the applicability of expensive external oracles such as large language models in answering top-k queries over predicted scores. Such scores are incurred by user-defined functions to answer personalized queries over…

Databases · Computer Science 2025-02-19 Sohrab Namazi Nia , Subhodeep Ghosh , Senjuti Basu Roy , Sihem Amer-Yahia

Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources

Query expansion has been employed for a long time to improve the accuracy of query retrievers. Earlier works relied on pseudo-relevance feedback (PRF) techniques, which augment a query with terms extracted from documents retrieved in a…

Information Retrieval · Computer Science 2024-06-12 Muhammad Shihab Rashid , Jannat Ara Meem , Yue Dong , Vagelis Hristidis

Probably Approximately Optimal Query Optimization

Evaluating query predicates on data samples is the only way to estimate their selectivity in certain scenarios. Finding a guaranteed optimal query plan is not a reasonable optimization goal in those cases as it might require an infinite…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Accelerating Aggregation Queries on Unstructured Streams of Data

Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work…

Databases · Computer Science 2023-08-21 Matthew Russo , Tatsunori Hashimoto , Daniel Kang , Yi Sun , Matei Zaharia

Robust Plan Evaluation based on Approximate Probabilistic Machine Learning

Query optimizers in RDBMSs search for execution plans expected to be optimal for given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select plans that are…

Databases · Computer Science 2025-05-27 Amin Kamali , Verena Kantere , Calisto Zuzarte , Vincent Corvinelli

Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

Uncertainty quantification (UQ) is essential for safe deployment of generative AI models such as large language models (LLMs), especially in high stakes applications. Conformal prediction (CP) offers a principled uncertainty quantification…

Machine Learning · Computer Science 2025-06-09 Sima Noorani , Shayan Kiyani , George Pappas , Hamed Hassani

Approximation Schemes for Many-Objective Query Optimization

The goal of multi-objective query optimization (MOQO) is to find query plans that realize a good compromise between conflicting objectives such as minimizing execution time and minimizing monetary fees in a Cloud scenario. A previously…

Databases · Computer Science 2014-04-02 Immanuel Trummer , Christoph Koch

Probery: A Probability-based Incomplete Query Optimization for Big Data

Nowadays, query optimization has been highly concerned in big data management, especially in NoSQL databases. Approximate queries boost query performance by loss of accuracy, for example, sampling approaches trade off query completeness for…

Databases · Computer Science 2019-01-03 Jie Song , Yichuan Zhang , Yubin Bao , Ge Yu

ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

As more and more organizations rely on data-driven decision making, large-scale analytics become increasingly important. However, an analyst is often stuck waiting for an exact result. As such, organizations turn to Cloud providers that…

Databases · Computer Science 2020-03-17 Fotis Savva , Christos Anagnostopoulos , Peter Triantafillou

Approximate Lifted Inference with Probabilistic Databases

This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each…

Databases · Computer Science 2014-12-03 Wolfgang Gatterbauer , Dan Suciu

Combined Approximations for Uniform Operational Consistent Query Answering

Operational consistent query answering (CQA) is a recent framework for CQA based on revised definitions of repairs, which are built by applying a sequence of operations (e.g., fact deletions) starting from an inconsistent database until we…

Databases · Computer Science 2025-08-25 Marco Calautti , Ester Livshits , Andreas Pieris , Markus Schneider

Combined Approximations for Uniform Operational Consistent Query Answering

Operational consistent query answering (CQA) is a recent framework for CQA based on revised definitions of repairs, which are built by applying a sequence of operations (e.g., fact deletions) starting from an inconsistent database until we…

Databases · Computer Science 2023-12-14 Marco Calautti , Ester Livshits , Andreas Pieris , Markus Schneider

On Constrained Open-World Probabilistic Databases

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently…

Artificial Intelligence · Computer Science 2019-04-04 Tal Friedman , Guy Van den Broeck

Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates

Schema matching is a central challenge for data integration systems. Inspired by the popularity and the success of crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching. Since…

Databases · Computer Science 2018-09-12 Chen Jason Zhang , Lei Chen , H. V. Jagadish , Mengchen Zhang , Yongxin Tong

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve…

Computation and Language · Computer Science 2021-02-16 Patrick Lewis , Yuxiang Wu , Linqing Liu , Pasquale Minervini , Heinrich Küttler , Aleksandra Piktus , Pontus Stenetorp , Sebastian Riedel

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) improves the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since looking for relevant documents from large…

Databases · Computer Science 2025-10-28 Shai Bergman , Anne-Marie Kermarrec , Diana Petrescu , Rafael Pires , Mathis Randl , Martijn de Vos , Ji Zhang