Related papers: QVCache: A Query-Aware Vector Cache

Quake: Adaptive Indexing for Vector Search

Vector search, the task of finding the k-nearest neighbors of a query vector against a database of high-dimensional vectors, underpins many machine learning applications, including retrieval-augmented generation, recommendation systems, and…

Information Retrieval · Computer Science 2025-06-10 Jason Mohoney , Devesh Sarda , Mengze Tang , Shihabur Rahman Chowdhury , Anil Pacaci , Ihab F. Ilyas , Theodoros Rekatsinas , Shivaram Venkataraman

HAKES: Scalable Vector Database for Embedding Search Service

Modern deep learning models capture the semantics of complex data by transforming them into high-dimensional embedding vectors. Emerging applications, such as retrieval-augmented generation, use approximate nearest neighbor (ANN) search in…

Databases · Computer Science 2025-10-01 Guoyu Hu , Shaofeng Cai , Tien Tuan Anh Dinh , Zhongle Xie , Cong Yue , Gang Chen , Beng Chin Ooi

GoVector: An I/O-Efficient Caching Strategy for High-Dimensional Vector Nearest Neighbor Search

Graph-based high-dimensional vector indices have become a mainstream solution for large-scale approximate nearest neighbor search (ANNS). However, their substantial memory footprint often requires storage on secondary devices, where…

Databases · Computer Science 2025-08-22 Yijie Zhou , Shengyuan Lin , Shufeng Gong , Song Yu , Shuhao Fan , Yanfeng Zhang , Ge Yu

On Storage Neural Network Augmented Approximate Nearest Neighbor Search

Large-scale approximate nearest neighbor search (ANN) has been gaining attention along with the latest machine learning researches employing ANNs. If the data is too large to fit in memory, it is necessary to search for the most similar…

Machine Learning · Computer Science 2025-01-29 Taiga Ikeda , Daisuke Miyashita , Jun Deguchi

PQCache: Product Quantization-based KVCache for Long Context LLM Inference

As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), the intermediate representations of tokens within LLM inference, has now become the primary…

Computation and Language · Computer Science 2025-04-01 Hailin Zhang , Xiaodong Ji , Yilin Chen , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Weipeng Chen , Bin Cui

LSM-VEC: A Large-Scale Disk-Based System for Dynamic Vector Search

Vector search underpins modern AI applications by supporting approximate nearest neighbor (ANN) queries over high-dimensional embeddings in tasks like retrieval-augmented generation (RAG), recommendation systems, and multimodal search.…

Databases · Computer Science 2026-05-19 Shurui Zhong , Dingheng Mo , Siqiang Luo

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Wei Chen , Jincai Chen , Fuhao Zou , Yuan-Fang Li , Ping Lu , Qiang Wang , Wei Zhao

SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search

Vector similarity search presents significant challenges in terms of scalability for large and high-dimensional datasets, as well as in providing native support for hybrid queries. Serverless computing and cloud functions offer attractive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-04 Joe Oakley , Hakan Ferhatosmanoglu

vCache: Verified Semantic Prompt Caching

Semantic caches return cached responses for semantically similar prompts to reduce LLM inference latency and cost. They embed cached prompts and store them alongside their response in a vector database. Embedding similarity metrics assign a…

Machine Learning · Computer Science 2026-02-24 Luis Gaspar Schroeder , Aditya Desai , Alejandro Cuadron , Kyle Chu , Shu Liu , Mark Zhao , Stephan Krusche , Alfons Kemper , Matei Zaharia , Joseph E. Gonzalez

Vector Search for the Future: From Memory-Resident, Static Heterogeneous Storage, to Cloud-Native Architectures

Vector search (VS) has become a fundamental component in multimodal data management, enabling core functionalities such as image, video, and code retrieval. As vector data scales rapidly, VS faces growing challenges in balancing search,…

Databases · Computer Science 2026-01-06 Yitong Song , Xuanhe Zhou , Christian S. Jensen , Jianliang Xu

VecFlow: A High-Performance Vector Data Management System for Filtered-Search on GPUs

Vector search and database systems have become a keystone component in many AI applications. While many prior research has investigated how to accelerate the performance of generic vector search, emerging AI applications require running…

Databases · Computer Science 2025-06-03 Jingyi Xi , Chenghao Mo , Benjamin Karsin , Artem Chirkin , Mingqin Li , Minjia Zhang

Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN

Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images. As datasets scale to billions of vectors, disk-based vector search…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-07 Nam Anh Dang , Ben Landrum , Ken Birman

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs

Vector approximate nearest neighbor search (ANNS) underpins search engines, recommendation systems, and advertising services. Recent advances in ANNS indexes make CPU a cost-effective choice for serving million-scale, in-memory vector…

Information Retrieval · Computer Science 2026-05-12 Yuchen Huang , Baiteng Ma , Yiping Sun , Yang Shi , Xiao Chen , Xiaocheng Zhong , Zhiyong Wang , Yao Hu , Chuliang Weng

LEANN: A Low-Storage Vector Index

Embedding-based vector search underpins many important applications, such as recommendation and retrieval-augmented generation (RAG). It relies on vector indices to enable efficient search. However, these indices require storing…

Databases · Computer Science 2025-11-26 Yichuan Wang , Zhifei Li , Shu Liu , Yongji Wu , Ziming Mao , Yilong Zhao , Xiao Yan , Zhiying Xu , Yang Zhou , Ion Stoica , Sewon Min , Matei Zaharia , Joseph E. Gonzalez

OrchANN: A Unified I/O Orchestration Framework for Skewed Out-of-Core Vector Search

Approximate nearest neighbor search (ANNS) at billion scale is fundamentally an out-of-core problem: vectors and indexes live on SSD, so performance is dominated by I/O rather than compute. Under skewed semantic embeddings, existing…

Databases · Computer Science 2025-12-30 Chengying Huan , Lizheng Chen , Zhengyi Yang , Shaonan Ma , Rong Gu , Renjie Yao , Zhibin Wang , Mingxing Zhang , Fang Xi , Jie Tao , Gang Zhang , Guihai Chen , Chen Tian

CALL: Context-Aware Low-Latency Retrieval in Disk-Based Vector Databases

Embedding models capture both semantic and syntactic structures of queries, often mapping different queries to similar regions in vector space. This results in non-uniform cluster access patterns in modern disk-based vector databases. While…

Databases · Computer Science 2025-09-24 Yeonwoo Jeong , Hyunji Cho , Kyuri Park , Youngjae Kim , Sungyong Park

Optimizing SSD-Resident Graph Indexing for High-Throughput Vector Search

Graph-based approximate nearest neighbor search (ANNS) methods (e.g., HNSW) have become the de facto state of the art for their high precision and low latency. To scale beyond main memory, recent out-of-memory ANNS systems leverage SSDs to…

Databases · Computer Science 2026-02-27 Weichen Zhao , Yuncheng Lu , Yao Tian , Hao Zhang , Jiehui Li , Minghao Zhao , Yakun Li , Weining Qian

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search

Approximate Nearest Neighbor Search (ANNS) is now widely used in various applications, ranging from information retrieval, question answering, and recommendation, to search for similar high-dimensional vectors. As the amount of vector data…

Information Retrieval · Computer Science 2024-10-21 Yuming Xu , Hengyu Liang , Jin Li , Shuotao Xu , Qi Chen , Qianxi Zhang , Cheng Li , Ziyue Yang , Fan Yang , Yuqing Yang , Peng Cheng , Mao Yang

VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference

This paper presents VLCache, a cache reuse framework that exploits both Key-Value (KV) cache and encoder cache from prior multimodal inputs to eliminate costly recomputation when the same multimodal inputs recur. Unlike previous heuristic…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Shengling Qin , Hao Yu , Chenxin Wu , Zheng Li , Yizhong Cao , Zhengyang Zhuge , Yuxin Zhou , Wentao Yao , Yi Zhang , Zhengheng Wang , Shuai Bai , Jianwei Zhang , Junyang Lin

MicroNN: An On-device Disk-resident Updatable Vector Database

Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search over large vector collections is a well…

Databases · Computer Science 2025-04-09 Jeffrey Pound , Floris Chabert , Arjun Bhushan , Ankur Goswami , Anil Pacaci , Shihabur Rahman Chowdhury