English

An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting

Distributed, Parallel, and Cluster Computing 2025-05-08 v1 Genomics

Abstract

This paper describes a new asynchronous algorithm and implementation for the problem of k-mer counting (KC), which concerns quantifying the frequency of length k substrings in a DNA sequence. This operation is common to many computational biology workloads and can take up to 77% of the total runtime of de novo genome assembly. The performance and scalability of the current state-of-the-art distributed-memory KC algorithm are hampered by multiple rounds of Many-To-Many collectives. Therefore, we develop an asynchronous algorithm (DAKC) that uses fine-grained, asynchronous messages to obviate most of this global communication while utilizing network bandwidth efficiently via custom message aggregation protocols. DAKC can perform strong scaling up to 256 nodes (512 sockets / 6K cores) and can count k-mers up to 9x faster than the state-of-the-art distributed-memory algorithm, and up to 100x faster than the shared-memory alternative. We also provide an analytical model to understand the hardware resource utilization of our asynchronous KC algorithm and provide insights on the performance.

Keywords

Cite

@article{arxiv.2505.04431,
  title  = {An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting},
  author = {Souvadra Hati and Akihiro Hayashi and Richard Vuduc},
  journal= {arXiv preprint arXiv:2505.04431},
  year   = {2025}
}

Comments

Accepted at IEEE IPDPS 2025 Conference

R2 v1 2026-06-28T23:24:30.639Z