Related papers: Intrinsic Dimensionality
Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized…
This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets.…
The intrinsic dimensionality refers to the ``true'' dimensionality of the data, as opposed to the dimensionality of the data representation. For example, when attributes are highly correlated, the intrinsic dimensionality can be much lower…
High dimensional data can have a surprising property: pairs of data points may be easily separated from each other, or even from arbitrary subsets, with high probability using just simple linear classifiers. However, this is more of a rule…
This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric…
The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID)…
The main contribution of this dissertation is the introduction of new or improved approximation algorithms and data structures for several similarity search problems. We examine the furthest neighbor query, the annulus query, distance…
The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from…
The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…
The concept of depth has proved very important for multivariate and functional data analysis, as it essentially acts as a surrogate for the notion a ranking of observations which is absent in more than one dimension. Motivated by the rapid…
The concept of data depth leads to a center-outward ordering of multivariate data, and it has been effectively used for developing various data analytic tools. While different notions of depth were originally developed for finite…
To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear…
We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper (arXiv:cs/0703125). The main features of our approach are that a high intrinsic dimension of a…
Extracting the relevant information by exploiting the spatial data warehouse becomes increasingly hard. In fact, because of the enormous amount of data stored in the spatial data warehouse, the user, usually, don't know what part of the…
The aim of this paper is to propose a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, which seems to be flexible enough to address such issues as analysis of complexity,…
The ongoing exponential rise in recording capacity calls for new approaches for analysing and interpreting neural data. Effective dimensionality has emerged as an important property of neural activity across populations of neurons, yet…
Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…
Difficulties in replication and reproducibility of empirical evidences in machine learning research have become a prominent topic in recent years. Ensuring that machine learning research results are sound and reliable requires…
In this speculative analysis, interdimensionality is introduced as the (co)existence of universes embedded into larger ones. These interdimensional universes may be isolated or intertwined, suggesting a variety of interdimensional intrinsic…
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in…