English

Binary Embedding: Fundamental Limits and Fast Algorithm

Data Structures and Algorithms 2019-01-24 v2 Information Theory math.IT

Abstract

Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary NN distinct points in Sp1\mathbb{S}^{p-1}, our goal is to encode each point using mm-dimensional binary strings such that we can reconstruct their geodesic distance up to δ\delta uniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp)O\big(mp\big). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m=Ω(1δ2logN)m = \Omega(\frac{1}{\delta^2}\log{N}) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3) we also provide an analytic result about embedding a general set of points KSp1K \subseteq \mathbb{S}^{p-1} with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

Keywords

Cite

@article{arxiv.1502.05746,
  title  = {Binary Embedding: Fundamental Limits and Fast Algorithm},
  author = {Xinyang Yi and Constantine Caramanis and Eric Price},
  journal= {arXiv preprint arXiv:1502.05746},
  year   = {2019}
}

Comments

Note: the previous version of this paper also included a claimed fast upper bound for certain parameter regimes. The proof of this had an error, as pointed out in Dirksen and Stollenwerk (2018); the same paper also presents a correct algorithm for the setting

R2 v1 2026-06-22T08:33:39.083Z