Related papers: Dimension Estimation Using Random Connection Model…
Modern datasets are characterized by a large number of features that may conceal complex dependency structures. To deal with this type of data, dimensionality reduction techniques are essential. Numerous dimensionality reduction methods…
We propose a new method for estimating the intrinsic dimension of a dataset by applying the principle of regularized maximum likelihood to the distances between close neighbors. We propose a regularization scheme which is motivated by…
The intrinsic dimensionality refers to the ``true'' dimensionality of the data, as opposed to the dimensionality of the data representation. For example, when attributes are highly correlated, the intrinsic dimensionality can be much lower…
The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…
Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely…
Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based…
Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a…
It has long been thought that high-dimensional data encountered in many practical machine learning tasks have low-dimensional structure, i.e., the manifold hypothesis holds. A natural question, thus, is to estimate the intrinsic dimension…
An additive autoencoder for dimension reduction, which is composed of a serially performed bias estimation, linear trend estimation, and nonlinear residual estimation, is proposed and analyzed. Computational experiments confirm that an…
In the last decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance. Despite the great deal of research work devoted to this task, most of the proposed solutions prove to be unreliable when the…
We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample size $n$, estimation and inference is…
Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since…
It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the…
Most of the existing methods for estimating the local intrinsic dimension of a data distribution do not scale well to high-dimensional data. Many of them rely on a non-parametric nearest neighbors approach which suffers from the curse of…
The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold. Estimating the dimension of this manifold is essential for leveraging its structure, yet existing work on dimension estimation is…
The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID)…
The amount of information available in spectro-polarimetric data is estimated. To this end, the intrinsic dimensionality of the data is inferred with the aid of a recently derived estimator based on nearest-neighbor considerations and…
The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in…
The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID…
Many algorithms in machine learning and computational geometry require, as input, the intrinsic dimension of the manifold that supports the probability distribution of the data. This parameter is rarely known and therefore has to be…