Binary Embedding: Fundamental Limits and Fast Algorithm

Xinyang Yi; Constantine Caramanis; Eric Price

Binary Embedding: Fundamental Limits and Fast Algorithm

Data Structures and Algorithms 2019-01-24 v2 Information Theory math.IT

Authors: Xinyang Yi , Constantine Caramanis , Eric Price

Abstract

Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary $N$ distinct points in $\mathbb{S}^{p-1}$ , our goal is to encode each point using $m$ -dimensional binary strings such that we can reconstruct their geodesic distance up to $\delta$ uniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time $O\big(mp\big)$ . We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires $m = \Omega(\frac{1}{\delta^2}\log{N})$ bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3) we also provide an analytic result about embedding a general set of points $K \subseteq \mathbb{S}^{p-1}$ with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

Keywords

metric space dimensionality reduction binary analysis

Cite

@article{arxiv.1502.05746,
  title  = {Binary Embedding: Fundamental Limits and Fast Algorithm},
  author = {Xinyang Yi and Constantine Caramanis and Eric Price},
  journal= {arXiv preprint arXiv:1502.05746},
  year   = {2019}
}

Comments

Note: the previous version of this paper also included a claimed fast upper bound for certain parameter regimes. The proof of this had an error, as pointed out in Dirksen and Stollenwerk (2018); the same paper also presents a correct algorithm for the setting

Binary Embedding: Fundamental Limits and Fast Algorithm

Abstract

Keywords

Cite

Comments

Related papers