English
Related papers

Related papers: Dimension Estimation Using Random Connection Model…

200 papers

Modern datasets are characterized by a large number of features that may conceal complex dependency structures. To deal with this type of data, dimensionality reduction techniques are essential. Numerous dimensionality reduction methods…

Methodology · Statistics 2021-06-02 Francesco Denti , Diego Doimo , Alessandro Laio , Antonietta Mira

We propose a new method for estimating the intrinsic dimension of a dataset by applying the principle of regularized maximum likelihood to the distances between close neighbors. We propose a regularization scheme which is motivated by…

Machine Learning · Computer Science 2012-03-19 Mithun Das Gupta , Thomas S. Huang

The intrinsic dimensionality refers to the ``true'' dimensionality of the data, as opposed to the dimensionality of the data representation. For example, when attributes are highly correlated, the intrinsic dimensionality can be much lower…

Machine Learning · Statistics 2020-11-30 Erik Thordsen , Erich Schubert

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…

Machine Learning · Computer Science 2025-03-13 Kadir Özçoban , Murat Manguoğlu , Emrullah Fatih Yetkin

Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely…

Machine Learning · Computer Science 2026-03-12 Eng-Jon Ong , Omer Bobrowski , Gesine Reinert , Primoz Skraba

Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based…

Machine Learning · Statistics 2026-01-28 Antonio Di Noia , Federico Ravenda , Antonietta Mira

Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a…

Machine Learning · Statistics 2018-03-20 Elena Facco , Maria d'Errico , Alex Rodriguez , Alessandro Laio

It has long been thought that high-dimensional data encountered in many practical machine learning tasks have low-dimensional structure, i.e., the manifold hypothesis holds. A natural question, thus, is to estimate the intrinsic dimension…

Machine Learning · Statistics 2022-06-01 Adam Block , Zeyu Jia , Yury Polyanskiy , Alexander Rakhlin

An additive autoencoder for dimension reduction, which is composed of a serially performed bias estimation, linear trend estimation, and nonlinear residual estimation, is proposed and analyzed. Computational experiments confirm that an…

Machine Learning · Computer Science 2022-10-14 Tommi Kärkkäinen , Jan Hänninen

In the last decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance. Despite the great deal of research work devoted to this task, most of the proposed solutions prove to be unreliable when the…

Machine Learning · Computer Science 2012-06-19 Claudio Ceruti , Simone Bassis , Alessandro Rozza , Gabriele Lombardi , Elena Casiraghi , Paola Campadelli

We consider non-parametric estimation and inference of conditional moment models in high dimensions. We show that even when the dimension $D$ of the conditioning variable is larger than the sample size $n$, estimation and inference is…

Machine Learning · Computer Science 2019-06-19 Khashayar Khosravi , Greg Lewis , Vasilis Syrgkanis

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since…

It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the…

Machine Learning · Statistics 2025-07-21 James A. D. Binnie , Paweł Dłotko , John Harvey , Jakub Malinowski , Ka Man Yim

Most of the existing methods for estimating the local intrinsic dimension of a data distribution do not scale well to high-dimensional data. Many of them rely on a non-parametric nearest neighbors approach which suffers from the curse of…

The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold. Estimating the dimension of this manifold is essential for leveraging its structure, yet existing work on dimension estimation is…

Machine Learning · Computer Science 2026-04-02 Zelong Bi , Pierre Lafaye de Micheaux

The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID)…

Machine Learning · Computer Science 2023-04-18 Maximilian Stubbemann , Tom Hanika , Friedrich Martin Schneider

The amount of information available in spectro-polarimetric data is estimated. To this end, the intrinsic dimensionality of the data is inferred with the aid of a recently derived estimator based on nearest-neighbor considerations and…

The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in…

Data Analysis, Statistics and Probability · Physics 2015-05-07 Jean Golay , Mikhail Kanevski

The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID…

Machine Learning · Statistics 2026-04-02 Antonio Di Noia , Iuri Macocco , Aldo Glielmo , Alessandro Laio , Antonietta Mira

Many algorithms in machine learning and computational geometry require, as input, the intrinsic dimension of the manifold that supports the probability distribution of the data. This parameter is rarely known and therefore has to be…

Statistics Theory · Mathematics 2020-01-01 Jisu Kim , Alessandro Rinaldo , Larry Wasserman
‹ Prev 1 2 3 10 Next ›