Related papers: Measuring the Data

A Novel Approach for Intrinsic Dimension Estimation

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…

Machine Learning · Computer Science 2025-03-13 Kadir Özçoban , Murat Manguoğlu , Emrullah Fatih Yetkin

The Shape of Data: Intrinsic Distance for Data Distributions

The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data…

Machine Learning · Statistics 2020-02-18 Anton Tsitsulin , Marina Munkhoeva , Davide Mottin , Panagiotis Karras , Alex Bronstein , Ivan Oseledets , Emmanuel Müller

Augmentation Invariant Manifold Learning

Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve…

Machine Learning · Statistics 2025-01-16 Shulei Wang

A Survey of Dimension Estimation Methods

It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the…

Machine Learning · Statistics 2025-07-21 James A. D. Binnie , Paweł Dłotko , John Harvey , Jakub Malinowski , Ka Man Yim

A Topological Approach to Inferring the Intrinsic Dimension of Convex Sensing Data

We consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we…

Algebraic Topology · Mathematics 2020-07-08 Min-Chun Wu , Vladimir Itskov

Robust estimation of the intrinsic dimension of data sets with quantum cognition machine learning

We propose a new data representation method based on Quantum Cognition Machine Learning and apply it to manifold learning, specifically to the estimation of intrinsic dimension of data sets. The idea is to learn a representation of each…

Machine Learning · Statistics 2024-09-20 Luca Candelori , Alexander G. Abanov , Jeffrey Berger , Cameron J. Hogan , Vahagn Kirakosyan , Kharen Musaelian , Ryan Samson , James E. T. Smith , Dario Villani , Martin T. Wells , Mengjia Xu

A Survey and Comparative Evaluation of Intrinsic Dimension Estimators under the Manifold Hypothesis

The manifold hypothesis suggests that high-dimensional data often lie on or near a low-dimensional manifold. Estimating the dimension of this manifold is essential for leveraging its structure, yet existing work on dimension estimation is…

Machine Learning · Computer Science 2026-04-02 Zelong Bi , Pierre Lafaye de Micheaux

Intrinsic dimension estimation of data by principal component analysis

Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however,…

Computer Vision and Pattern Recognition · Computer Science 2010-02-11 Mingyu Fan , Nannan Gu , Hong Qiao , Bo Zhang

Intrinsic Isometric Manifold Learning with Application to Localization

Data living on manifolds commonly appear in many applications. Often this results from an inherently latent low-dimensional system being observed through higher dimensional measurements. We show that under certain conditions, it is possible…

Machine Learning · Statistics 2018-07-05 Ariel Schwartz , Ronen Talmon

Big Data and model-based survey sampling

Big Data are huge amounts of digital information that are automatically accrued or merged from several sources and rarely result from properly planned surveys. A Big Dataset is herein conceived of as a collection of information concerning a…

Computation · Statistics 2020-02-12 Deldossi Laura , Tommasi Chiara

Data-driven Discovery of Invariant Measures

Invariant measures encode the long-time behaviour of a dynamical system. In this work, we propose an optimization-based method to discover invariant measures directly from data gathered from a system. Our method does not require an explicit…

Dynamical Systems · Mathematics 2025-10-09 Jason J. Bramburger , Giovanni Fantuzzi

Extrinsic Principal Component Analysis

One develops a fast computational methodology for principal component analysis on manifolds. Instead of estimating intrinsic principal components on an object space with a Riemannian structure, one embeds the object space in a numerical…

Methodology · Statistics 2024-10-04 Ka Chun Wong , Vic Patrangenaru , Robert L. Paige , Mihaela Pricop Jeckstadt

An Infinite Dimensional Analysis of Kernel Principal Components

We study non-linear data-dimension reduction. We are motivated by the classical linear framework of Principal Component Analysis. In nonlinear case, we introduce instead a new kernel-Principal Component Analysis, manifold and feature space…

Functional Analysis · Mathematics 2022-09-09 Palle E. T. Jorgensen , Sooran Kang , Myung-Sin Song , Feng Tian

Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets

Dimensionality-reduction methods are a fundamental tool in the analysis of large data sets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it…

Machine Learning · Computer Science 2018-10-30 Henry Kvinge , Elin Farnell , Michael Kirby , Chris Peterson

A Framework for Data-Driven Computational Mechanics Based on Nonlinear Optimization

Data-Driven Computational Mechanics is a novel computing paradigm that enables the transition from standard data-starved approaches to modern data-rich approaches. At this early stage of development, one can distinguish two mainstream…

Numerical Analysis · Mathematics 2019-10-29 Cristian Guillermo Gebhardt , Dominik Schillinger , Marc Christian Steinbach , Raimund Rolfes

Human-aligned Quantification of Numerical Data

Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value…

Data Analysis, Statistics and Probability · Physics 2025-11-21 Anton Kolonin

Information Theory Measures via Multidimensional Gaussianization

Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle…

Machine Learning · Statistics 2024-10-30 Valero Laparra , J. Emmanuel Johnson , Gustau Camps-Valls , Raul Santos-Rodríguez , Jesus Malo

Dimensionality compression and expansion in Deep Neural Networks

Datasets such as images, text, or movies are embedded in high-dimensional spaces. However, in important cases such as images of objects, the statistical structure in the data constrains samples to a manifold of dramatically lower…

Machine Learning · Computer Science 2019-10-29 Stefano Recanatesi , Matthew Farrell , Madhu Advani , Timothy Moore , Guillaume Lajoie , Eric Shea-Brown

Manifold Diffusion Geometry: Curvature, Tangent Spaces, and Dimension

We introduce novel estimators for computing the curvature, tangent spaces, and dimension of data from manifolds, using tools from diffusion geometry. Although classical Riemannian geometry is a rich source of inspiration for geometric data…

Differential Geometry · Mathematics 2026-02-13 Iolo Jones

A scale-based approach to finding effective dimensionality in manifold learning

The discovering of low-dimensional manifolds in high-dimensional data is one of the main goals in manifold learning. We propose a new approach to identify the effective dimension (intrinsic dimension) of low-dimensional manifolds. The scale…

Statistics Theory · Mathematics 2008-03-17 Xiaohui Wang , J. S. Marron