Related papers: Subspace Embeddings Under Nonlinear Transformation…
Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for linear algebra problems. We show that, given a matrix $A \in \R^{n \times d}$ with $n \gg d$ and a $p \in [1, 2)$,…
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have…
Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real $p, 1 \leq p < \infty$, given a matrix $M \in \mathbb{R}^{n \times d}$ with $n \gg d$, with constant…
An oblivious subspace embedding is a random $m\times n$ matrix $\Pi$ such that, for any $d$-dimensional subspace, with high probability $\Pi$ preserves the norms of all vectors in that subspace within a $1\pm\epsilon$ factor. In this work,…
The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise…
Dimension reduction algorithms are a crucial part of many data science pipelines, including data exploration, feature creation and selection, and denoising. Despite their wide utilization, many non-linear dimension reduction algorithms are…
Under which conditions and with which distortions can we preserve the pairwise-distances of low-complexity vectors, e.g., for structured sets such as the set of sparse vectors or the one of low-rank matrices, when these are mapped in a…
In this paper new general modewise Johnson-Lindenstrauss (JL) subspace embeddings are proposed that are both considerably faster to generate and easier to store than traditional JL embeddings when working with extremely large vectors and/or…
We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by…
We show an analog to the Fast Johnson-Lindenstrauss Transform for Nearest Neighbor Preserving Embeddings in $\ell_2$. These are sparse, randomized embeddings that preserve the (approximate) nearest neighbors. The dimensionality of the…
This paper investigates theoretical properties of subsampling and hashing as tools for approximate Euclidean norm-preserving embeddings for vectors with (unknown) additive Gaussian noises. Such embeddings are sometimes called…
Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation. This procedure conditions the…
A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\…
As network data has become ubiquitous in the sciences, there has been growing interest in network models whose structure is driven by latent node-level variables in a (typically low-dimensional) latent geometric space. These "latent…
In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional…
Given metric spaces $(X,d)$ and $(Y,\rho)$ and an ordering $x_1,x_2,\ldots,x_n$ of $(X,d)$, an embedding $f: X \rightarrow Y$ is said to have a prioritized distortion $\alpha(\cdot)$, if for any pair $x_j,x'$ of distinct points in $X$, the…
Node embedding methods find latent lower-dimensional representations which are used as features in machine learning models. In the last few years, these methods have become extremely popular as a replacement for manual feature engineering.…
Traditional neural word embeddings are usually dependent on a richer diversity of vocabulary. However, the language models recline to cover major vocabularies via the word embedding parameters, in particular, for multilingual language…
We give a fast oblivious L2-embedding of $A\in \mathbb{R}^{n x d}$ to $B\in \mathbb{R}^{r x d}$ satisfying $(1-\varepsilon)\|A x\|_2^2 \le \|B x\|_2^2 <= (1+\varepsilon) \|Ax\|_2^2.$ Our embedding dimension $r$ equals $d$, a constant…
In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space. We show that when we measure dimensionality in terms of information entropy then the embedding of sparse…