English
Related papers

Related papers: Work Sharing and Offloading for Efficient Approxim…

200 papers

We consider a similarity measure between two sets $A$ and $B$ of vectors, that balances the average and maximum cosine distance between pairs of vectors, one from set $A$ and one from set $B$. As a motivation for this measure, we present…

Data Structures and Algorithms · Computer Science 2021-08-31 Michael Leybovich , Oded Shmueli

Similarity join--a widely used operation in data science--finds all pairs of items that have distance smaller than a threshold. Prior work has explored distributed computation methods to scale similarity join to large data volumes but these…

Databases · Computer Science 2025-10-13 Yanqi Chen , Xiao Yan , Alexandra Meliou , Eric Lo

The rapid growth of machine learning capabilities and the adoption of data processing methods using vector embeddings sparked a great interest in creating systems for vector data management. While the predominant approach of vector data…

Databases · Computer Science 2024-03-26 Viktor Sanca , Anastasia Ailamaki

Uniform sampling and approximate counting are fundamental primitives for modern database applications, ranging from query optimization to approximate query processing. While recent breakthroughs have established optimal sampling and…

Databases · Computer Science 2026-05-13 Xiao Hu , Jinchao Huang

In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in…

Databases · Computer Science 2020-03-02 Shaleen Deep , Xiao Hu , Paraschos Koutris

Vector data is prevalent across business and scientific applications, and its popularity is growing with the proliferation of learned embeddings. Vector data collections often reach billions of vectors with thousands of dimensions, thus,…

Information Retrieval · Computer Science 2025-09-09 Ilias Azizi , Karima Echihab , Themis Palpanas , Vassilis Christophides

It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results…

Approximate Nearest Neighbor Search (ANNS) is essential for various data-intensive applications, including recommendation systems, image retrieval, and machine learning. Scaling ANNS to handle billions of high-dimensional vectors on a…

Databases · Computer Science 2025-06-18 Qian Xu , Feng Zhang , Chengxi Li , Lei Cao , Zheng Chen , Jidong Zhai , Xiaoyong Du

The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation…

Computer Vision and Pattern Recognition · Computer Science 2019-07-03 Icaro Cavalcante Dourado , Ricardo da Silva Torres

The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs. We explore dense vector representations as an effective way to approximate the same information: we introduce a simple yet…

Computation and Language · Computer Science 2019-06-18 Andrey Kutuzov , Mohammad Dorgham , Oleksiy Oliynyk , Chris Biemann , Alexander Panchenko

A similarity join aims to find all similar pairs between two collections of records. Established approaches usually deal with synthetic differences like typos and abbreviations, but neglect the semantic relations between words. Such…

Information Retrieval · Computer Science 2018-10-30 Pengfei Xu , Jiaheng Lu

As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string…

Databases · Computer Science 2011-12-01 Guoliang Li , Dong Deng , Jiannan Wang , Jianhua Feng

The join operation is a fundamental building block of parallel data processing. Unfortunately, it is very resource-intensive to compute an equi-join across massive datasets. The approximate computing paradigm allows users to trade accuracy…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-16 Do Le Quoc , Istemi Ekin Akkus , Pramod Bhatotia , Spyros Blanas , Ruichuan Chen , Christof Fetzer , Thorsten Strufe

ANNS for embedded vector representations of texts is commonly used in information retrieval, with two important information representations being sparse and dense vectors. While it has been shown that combining these representations…

Information Retrieval · Computer Science 2024-10-29 Haoyu Zhang , Jun Liu , Zhenhua Zhu , Shulin Zeng , Maojia Sheng , Tao Yang , Guohao Dai , Yu Wang

Many real-world tasks such as recommending videos with the kids tag can be reduced to finding most similar vectors associated with hard predicates. This task, filtered vector search, is challenging as prior state-of-the-art graph-based…

Databases · Computer Science 2025-07-22 Zhaoheng Li , Silu Huang , Wei Ding , Yongjoo Park , Jianjun Chen

Real-world vector embeddings are usually associated with extra labels, such as attributes and keywords. Many applications require the nearest neighbor search that contains specific labels, such as searching for product image embeddings…

Databases · Computer Science 2025-12-12 Mingyu Yang , Wenxuan Xia , Wentao Li , Raymond Chi-Wing Wong , Wei Wang

Set similarity join is a fundamental and well-studied database operator. It is usually studied in the exact setting where the goal is to compute all pairs of sets that exceed a given similarity threshold (measured e.g. as Jaccard…

Databases · Computer Science 2018-03-05 Tobias Christiani , Rasmus Pagh , Johan Sivertsen

Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic…

Information Retrieval · Computer Science 2024-09-27 Solmaz Seyed Monir , Irene Lau , Shubing Yang , Dongfang Zhao

Similarity-based vector search facilitates many important applications such as search and recommendation but is limited by the memory capacity and bandwidth of a single machine due to large datasets and intensive data read. In this paper,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-10 Xiangyu Zhi , Meng Chen , Xiao Yan , Baotong Lu , Hui Li , Qianxi Zhang , Qi Chen , James Cheng

Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art…

Databases · Computer Science 2026-02-20 Liuchang Jing , Mingyu Yang , Lei Li , Jianbin Qin , Wei Wang
‹ Prev 1 2 3 10 Next ›