English

Flow Matching for Count Data

Machine Learning 2026-05-11 v1 Machine Learning Quantitative Methods

Abstract

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

Keywords

Cite

@article{arxiv.2605.07746,
  title  = {Flow Matching for Count Data},
  author = {Ganchao Wei and John Pearson},
  journal= {arXiv preprint arXiv:2605.07746},
  year   = {2026}
}
R2 v1 2026-07-01T12:57:46.668Z