English

Embedding Compression via Spherical Coordinates

Machine Learning 2026-03-27 v4 Computer Vision and Pattern Recognition

Abstract

We present an ϵ\epsilon-bounded compression method for unit-norm embeddings that achieves 1.5×\times compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around π/2\pi/2, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is bounded by float32 machine epsilon (1.19×1071.19 \times 10^{-7}), making reconstructed values indistinguishable from originals at float32 precision. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent compression improvement with zero measurable retrieval degradation on BEIR benchmarks.

Keywords

Cite

@article{arxiv.2602.00079,
  title  = {Embedding Compression via Spherical Coordinates},
  author = {Han Xiao},
  journal= {arXiv preprint arXiv:2602.00079},
  year   = {2026}
}

Comments

Accepted at ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). 13 pages, 2 figures. Code: https://github.com/jina-ai/jzip

R2 v1 2026-07-01T09:28:24.049Z