Related papers: Cortex: Harnessing Correlations to Boost Query Per…

COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

Recent work proposed learned index structures, which learn the distribution of the underlying dataset to improve performance. The initial work on learned indexes has shown that by learning the cumulative distribution function of the data,…

Databases · Computer Science 2021-02-03 Ali Hadian , Behzad Ghaffari , Taiyi Wang , Thomas Heinis

Efficient Data Access Paths for Mixed Vector-Relational Search

The rapid growth of machine learning capabilities and the adoption of data processing methods using vector embeddings sparked a great interest in creating systems for vector data management. While the predominant approach of vector data…

Databases · Computer Science 2024-03-26 Viktor Sanca , Anastasia Ailamaki

MaskSearch: Querying Image Masks at Scale

Machine learning tasks over image databases often generate masks that annotate image content (e.g., saliency maps, segmentation maps, depth maps) and enable a variety of applications (e.g., determine if a model is learning spurious…

Databases · Computer Science 2024-01-09 Dong He , Jieyu Zhang , Maureen Daum , Alexander Ratner , Magdalena Balazinska

Indexes in Microsoft SQL Server

Indexes are the best apposite choice for quickly retrieving the records. This is nothing but cutting down the number of Disk IO. Instead of scanning the complete table for the results, we can decrease the number of IO's or page fetches…

Databases · Computer Science 2019-03-21 Sourav Mukherjee

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations (Extended Version)

Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the…

Databases · Computer Science 2019-04-03 Yingjun Wu , Jia Yu , Yuanyuan Tian , Richard Sidle , Ronald Barber

Cracking In-Memory Database Index A Case Study for Adaptive Radix Tree Index

Indexes provide a method to access data in databases quickly. It can improve the response speed of subsequent queries by building a complete index in advance. However, it also leads to a huge overhead of the continuous updating during…

Databases · Computer Science 2019-11-27 Gang Wu , Yidong Song , Guodong Zhao , Wei Sun , Donghong Han , Baiyou Qiao , Guoren Wang , Ye Yuan

COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics

Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…

Databases · Computer Science 2021-07-28 Tarique Siddiqui , Surajit Chaudhuri , Vivek Narasayya

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order),…

Databases · Computer Science 2020-06-25 Jialin Ding , Vikram Nathan , Mohammad Alizadeh , Tim Kraska

Discovering Structure in High-Dimensional Data Through Correlation Explanation

We introduce a method to learn a hierarchy of successively more abstract representations of complex data based on optimizing an information-theoretic objective. Intuitively, the optimization searches for a set of latent factors that best…

Machine Learning · Computer Science 2014-11-03 Greg Ver Steeg , Aram Galstyan

Search on Secondary Attributes in Geo-Distributed Systems

In the age of big data, more and more applications need to query and analyse large volumes of continuously updated data in real-time. In response, cloud-scale storage systems can extend their interface that allows fast lookups on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Dimitrios Vasilas

MINT: Multi-Vector Search Index Tuning

Vector search plays a crucial role in many real-world applications. In addition to single-vector search, multi-vector search becomes important for multi-modal and multi-feature scenarios today. In a multi-vector database, each row is an…

Databases · Computer Science 2026-05-05 Jiongli Zhu , Yue Wang , Bailu Ding , Philip A. Bernstein , Vivek Narasayya , Surajit Chaudhuri

Doc2Query--: When Less is More

Doc2Query -- the process of expanding the content of a document before indexing using a sequence-to-sequence model -- has emerged as a prominent technique for improving the first-stage retrieval effectiveness of search engines. However,…

Information Retrieval · Computer Science 2023-02-28 Mitko Gospodinov , Sean MacAvaney , Craig Macdonald

Using Learned Indexes to Improve Time Series Indexing Performance on Embedded Sensor Devices

Efficiently querying data on embedded sensor and IoT devices is challenging given the very limited memory and CPU resources. With the increasing volumes of collected data, it is critical to process, filter, and manipulate data on the edge…

Databases · Computer Science 2023-03-07 David Ding , Ivan Carvalho , Ramon Lawrence

Searching by index for similar sequences: the SEQR algorithm

This paper describes a method to efficiently retrieve protein database sequences similar to a query sequence, while allowing for significant numbers of mutations. We call this method SEQR for SEQuence Retrieval. This approach increases the…

Genomics · Quantitative Biology 2018-11-05 David I. Hurwitz , Lianyi Han , Lewis Y. Geer

Using Additional Indexes for Fast Full-Text Search of Phrases That Contain Frequently Used Words

Searches for phrases and word sets in large text arrays by means of additional indexes are considered. Their use may reduce the query-processing time by an order of magnitude in comparison with standard inverted files.

Information Retrieval · Computer Science 2018-11-27 A. B. Veretennikov

Data Series Indexing Gone Parallel

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive…

Databases · Computer Science 2020-09-04 Botao Peng

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Rongxin Cheng , Yifan Peng , Xingda Wei , Hongrui Xie , Rong Chen , Sijie Shen , Haibo Chen

Accelerating Nearest Neighbor Search on Manycore Systems

We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite…

Databases · Computer Science 2016-11-15 Lawrence Cayton

Relational Memory: Native In-Memory Accesses on Rows and Columns

Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming…

Databases · Computer Science 2022-02-08 Shahin Roozkhosh , Denis Hoornaert , Ju Hyoung Mun , Tarikul Islam Papon , Ahmed Sanaullah , Ulrich Drepper , Renato Mancuso , Manos Athanassoulis

Efficient Neural Ranking using Forward Indexes

Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper,…

Information Retrieval · Computer Science 2022-04-05 Jurek Leonhardt , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand