Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Seth Ockerman; Amal Gueroudji; Song Young Oh; Robert Underwood; Nicholas Chia; Kyle Chard; Robert Ross; Shivaram Venkataraman

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Distributed, Parallel, and Cluster Computing 2025-10-17 v2 Databases

Authors: Seth Ockerman , Amal Gueroudji , Song Young Oh , Robert Underwood , Nicholas Chia , Kyle Chard , Robert Ross , Shivaram Venkataraman

View on arXiv ↗ PDF ↗

Abstract

Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance on HPC platforms to guide future research and optimization.

Keywords

data processing computer architecture distributed computing

Cite

@article{arxiv.2509.12384,
  title  = {Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant},
  author = {Seth Ockerman and Amal Gueroudji and Song Young Oh and Robert Underwood and Nicholas Chia and Kyle Chard and Robert Ross and Shivaram Venkataraman},
  journal= {arXiv preprint arXiv:2509.12384},
  year   = {2025}
}

Comments

To appear in the SC'25 Workshop Frontiers in Generative AI for HPC Science and Engineering: Foundations, Challenges, and Opportunities

Exploring Distributed Vector Databases Performance on HPC Platforms: A Study with Qdrant

Abstract

Keywords

Cite

Comments

Related papers