Related papers: Tailwind: A Practical Framework for Query Accelera…

ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning

Reasoning-augmented search agents such as Search-R1, trained via reinforcement learning with verifiable rewards (RLVR), demonstrate remarkable capabilities in multi-step information retrieval from external knowledge sources. These agents…

Computation and Language · Computer Science 2025-08-14 Shu Zhao , Tan Yu , Anbang Xu , Japinder Singh , Aaditya Shukla , Rama Akkiraju

RelServe: Fast LLM Inference Serving on Relational Data

The use of Large Language Models (LLMs) for querying relational data has given rise to relQuery, a workload pattern that applies templated LLM calls to structured tables. As relQuery services become more widely adopted in applications such…

Databases · Computer Science 2026-01-21 Xin Zhang , Shihong Gao , Yanyan Shen , Haoyang Li , Lei Chen

RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attracted significant attention due to its…

Databases · Computer Science 2026-04-17 Seokwon Lee , Jaeyoung Sim , Sihyun Kim , Yuhsing Li , Yiwen Zhu , Kwanghyun Park

Making Databases Faster with LLM Evolutionary Sampling

Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these heuristics requires substantial engineering…

Databases · Computer Science 2026-02-12 Mehmet Hamza Erol , Xiangpeng Hao , Federico Bianchi , Ciro Greco , Jacopo Tagliabue , James Zou

The Tensor Data Platform: Towards an AI-centric Database System

Database engines have historically absorbed many of the innovations in data processing, adding features to process graph data, XML, object oriented, and text among many others. In this paper, we make the case that it is time to do the same…

Databases · Computer Science 2022-11-21 Apurva Gandhi , Yuki Asada , Victor Fu , Advitya Gemawat , Lihao Zhang , Rathijit Sen , Carlo Curino , Jesús Camacho-Rodríguez , Matteo Interlandi

Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system…

Databases · Computer Science 2019-04-26 Dimitrije Jankov , Shangyu Luo , Binhang Yuan , Zhuhua Cai , Jia Zou , Chris Jermaine , Zekai J. Gao

HRDBMS: Combining the Best of Modern and Traditional Relational Databases

HRDBMS is a novel distributed relational database that uses a hybrid model combining the best of traditional distributed relational databases and Big Data analytics platforms such as Hive. This allows HRDBMS to leverage years worth of…

Databases · Computer Science 2019-01-28 Jason Arnold , Boris Glavic , Ioan Raicu

Query Performance Explanation through Large Language Model for HTAP Systems

In hybrid transactional and analytical processing (HTAP) systems, users often struggle to understand why query plans from one engine (OLAP or OLTP) perform significantly slower than those from another. Although optimizers provide plan…

Databases · Computer Science 2024-12-03 Haibo Xiu , Li Zhang , Tieying Zhang , Jun Yang , Jianjun Chen

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these…

Databases · Computer Science 2026-03-03 Jiale Lao , Immanuel Trummer

TCUDB: Accelerating Database with Tensor Processors

The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix…

Databases · Computer Science 2021-12-15 Yu-Ching Hu , Yuliang Li , Hung-Wei Tseng

Experimenting with recursive queries in database and logic programming systems

This paper considers the problem of reasoning on massive amounts of (possibly distributed) data. Presently, existing proposals show some limitations: {\em (i)} the quantity of data that can be handled contemporarily is limited, due to the…

Artificial Intelligence · Computer Science 2007-05-23 Giorgio Terracina , Nicola Leone , Vincenzino Lio , Claudio Panetta

StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization

Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document…

Computation and Language · Computer Science 2025-05-27 Ziliang Wang , Xuhui Zheng , Kang An , Cijun Ouyang , Jialu Cai , Yuhang Wang , Yichao Wu

AI-Driven Research for Databases

As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques,…

Databases · Computer Science 2026-04-09 Audrey Cheng , Harald Ng , Aaron Kabcenell , Peter Bailis , Matei Zaharia , Lin Ma , Xiao Shi , Ion Stoica

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based…

Machine Learning · Computer Science 2022-11-23 Yi Zhai , Yu Zhang , Shuo Liu , Xiaomeng Chu , Jie Peng , Jianmin Ji , Yanyong Zhang

Scalable Relational Query Processing on Big Matrix Data

The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of…

Databases · Computer Science 2021-11-10 Yongyang Yu , Mingjie Tang , Walid G. Aref

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon…

Computation and Language · Computer Science 2025-10-15 Rui Lu , Zhenyu Hou , Zihan Wang , Hanchen Zhang , Xiao Liu , Yujiang Li , Shi Feng , Jie Tang , Yuxiao Dong

Extending Relational Query Processing with ML Inference

The broadening adoption of machine learning in the enterprise is increasing the pressure for strict governance and cost-effective performance, in particular for the common and consequential steps of model storage and inference. The RDBMS…

Databases · Computer Science 2019-11-04 Konstantinos Karanasos , Matteo Interlandi , Doris Xin , Fotis Psallidas , Rathijit Sen , Kwanghyun Park , Ivan Popivanov , Supun Nakandal , Subru Krishnan , Markus Weimer , Yuan Yu , Raghu Ramakrishnan , Carlo Curino

CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

Querying both structured and unstructured data has become a new paradigm in data analytics and recommendation. With unstructured data, such as text and videos, are converted to high-dimensional vectors and queried with approximate nearest…

Databases · Computer Science 2025-01-10 Rui Ma , Kai Zhang , Zhenying He , Yinan Jing , X. Sean Wang , Zhenqiang Chen

RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis

With the rapid advancement of Large Language Models (LLMs), there is an increasing need for challenging benchmarks to evaluate their capabilities in handling complex tabular data. However, existing benchmarks are either based on outdated…

Computation and Language · Computer Science 2025-12-16 Pengzuo Wu , Yuhang Yang , Guangcheng Zhu , Chao Ye , Hong Gu , Xu Lu , Ruixuan Xiao , Bowen Bao , Yijing He , Liangyu Zha , Wentao Ye , Junbo Zhao , Haobo Wang

Dissociation and Propagation for Approximate Lifted Inference with Standard Relational Database Management Systems

Probabilistic inference over large data sets is a challenging data management problem since exact inference is generally #P-hard and is most often solved approximately with sampling-based methods today. This paper proposes an alternative…

Databases · Computer Science 2016-06-15 Wolfgang Gatterbauer , Dan Suciu