English

StreamSampling.jl: Efficient Sampling from Data Streams in Julia

Software Engineering 2026-05-15 v2 Computation

Abstract

StreamSampling..jl is a Julia library designed to provide general and efficient methods for sampling from data streams in a single pass, even when the total number of items is unknown. In this paper, we describe the capabilities of the library and its advantages over traditional sampling procedures, such as maintaining a small, constant memory footprint and avoiding the need to fully materialize the stream in memory. Furthermore, we provide empirical benchmarks comparing online sampling methods against standard approaches, demonstrating performance and memory improvements.

Keywords

Cite

@article{arxiv.2603.21996,
  title  = {StreamSampling.jl: Efficient Sampling from Data Streams in Julia},
  author = {Adriano Meligrana},
  journal= {arXiv preprint arXiv:2603.21996},
  year   = {2026}
}

Comments

Accepted to the Proceedings of the JuliaCon Conferences