Related papers: Intrinsic Dimensionality

What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly

Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized…

Machine Learning · Computer Science 2025-04-30 Tom Hanika , Tobias Hille

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets.…

Information Retrieval · Computer Science 2019-07-18 Martin Aumüller , Matteo Ceccarello

ABID: Angle Based Intrinsic Dimensionality

The intrinsic dimensionality refers to the ``true'' dimensionality of the data, as opposed to the dimensionality of the data representation. For example, when attributes are highly correlated, the intrinsic dimensionality can be much lower…

Machine Learning · Statistics 2020-11-30 Erik Thordsen , Erich Schubert

Relative intrinsic dimensionality is intrinsic to learning

High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule…

Machine Learning · Computer Science 2023-11-15 Oliver J. Sutton , Qinghua Zhou , Alexander N. Gorban , Ivan Y. Tyukin

This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric…

Machine Learning · Computer Science 2024-05-14 L. Thorne McCarty

Intrinsic Dimension for Large-Scale Geometric Learning

The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID)…

Machine Learning · Computer Science 2023-04-18 Maximilian Stubbemann , Tom Hanika , Friedrich Martin Schneider

Similarity Problems in High Dimensions

The main contribution of this dissertation is the introduction of new or improved approximation algorithms and data structures for several similarity search problems. We examine the furthest neighbor query, the annulus query, distance…

Data Structures and Algorithms · Computer Science 2019-06-13 Johan von Tangen Sivertsen

Intrinsic Dimension of Geometric Data Sets

The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from…

Artificial Intelligence · Computer Science 2022-04-22 Tom Hanika , Friedrich Martin Schneider , Gerd Stumme

A Novel Approach for Intrinsic Dimension Estimation

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…

Machine Learning · Computer Science 2025-03-13 Kadir Özçoban , Murat Manguoğlu , Emrullah Fatih Yetkin

Statistical depth in abstract metric spaces

The concept of depth has proved very important for multivariate and functional data analysis, as it essentially acts as a surrogate for the notion a ranking of observations which is absent in more than one dimension. Motivated by the rapid…

Methodology · Statistics 2021-07-30 Gery Geenens , Alicia Nieto-Reyes , Giacomo Francisci

On data depth in infinite dimensional spaces

The concept of data depth leads to a center-outward ordering of multivariate data, and it has been effectively used for developing various data analytic tools. While different notions of depth were originally developed for finite…

Methodology · Statistics 2014-02-13 Anirvan Chakraborty , Probal Chaudhuri

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear…

Machine Learning · Computer Science 2025-03-04 Lorenzo Basile , Santiago Acevedo , Luca Bortolussi , Fabio Anselmi , Alex Rodriguez

An axiomatic approach to intrinsic dimension of a dataset

We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper (arXiv:cs/0703125). The main features of our approach are that a high intrinsic dimension of a…

Information Retrieval · Computer Science 2009-11-17 Vladimir Pestov

A New Similairty Measure For Spatial Personalization

Extracting the relevant information by exploiting the spatial data warehouse becomes increasingly hard. In fact, because of the enormous amount of data stored in the spatial data warehouse, the user, usually, don't know what part of the…

Databases · Computer Science 2012-09-11 Saida Aissa , Mohamed Salah Gouider

A geometric framework for modelling similarity search

The aim of this paper is to propose a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, which seems to be flexible enough to address such issues as analysis of complexity,…

Information Retrieval · Computer Science 2016-11-17 Vladimir Pestov

Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity

The ongoing exponential rise in recording capacity calls for new approaches for analysing and interpreting neural data. Effective dimensionality has emerged as an important property of neural activity across populations of neurons, yet…

Neurons and Cognition · Quantitative Biology 2021-08-30 Mehrdad Jazayeri , Srdjan Ostojic

The Extended Edit Distance Metric

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…

Information Retrieval · Computer Science 2010-06-18 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

Reproducibility and Geometric Intrinsic Dimensionality: An Investigation on Graph Neural Network Research

Difficulties in replication and reproducibility of empirical evidences in machine learning research have become a prominent topic in recent years. Ensuring that machine learning research results are sound and reliable requires…

Machine Learning · Computer Science 2024-03-20 Tobias Hille , Maximilian Stubbemann , Tom Hanika

Interdimensionality

In this speculative analysis, interdimensionality is introduced as the (co)existence of universes embedded into larger ones. These interdimensional universes may be isolated or intertwined, suggesting a variety of interdimensional intrinsic…

General Physics · Physics 2021-11-25 Karl Svozil

The Intrinsic Dimension of Images and Its Impact on Learning

It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Phillip Pope , Chen Zhu , Ahmed Abdelkader , Micah Goldblum , Tom Goldstein