Related papers: Scikit-dimension: a Python package for intrinsic d…

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a…

Machine Learning · Computer Science 2018-06-06 Fabian Pedregosa , Gaël Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Andreas Müller , Joel Nothman , Gilles Louppe , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , Jake Vanderplas , Alexandre Passos , David Cournapeau , Matthieu Brucher , Matthieu Perrot , Édouard Duchesnay

MDIntrinsicDimension: Dimensionality-Based Analysis of Collective Motions in Macromolecules from Molecular Dynamics Trajectories

Molecular dynamics (MD) simulations provide atomistic insights into the structure, dynamics, and function of biomolecules by generating time-resolved, high-dimensional trajectories. Analyzing such data benefits from estimating the minimal…

Biomolecules · Quantitative Biology 2026-03-02 Irene Cazzaniga , Toni Giorgino

intRinsic: an R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset

This article illustrates intRinsic, an R package that implements novel state-of-the-art likelihood-based estimators of the intrinsic dimension of a dataset, an essential quantity for most dimensionality reduction techniques. In order to…

Computation · Statistics 2023-02-24 Francesco Denti

Scale-adaptive and robust intrinsic dimension estimation via optimal neighbourhood identification

The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID…

Machine Learning · Statistics 2026-04-02 Antonio Di Noia , Iuri Macocco , Aldo Glielmo , Alessandro Laio , Antonietta Mira

Intrinsic dimension estimation for discrete metrics

Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods…

Machine Learning · Statistics 2023-03-14 Iuri Macocco , Aldo Glielmo , Jacopo Grilli , Alessandro Laio

direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods

The direpack package aims to establish a set of modern statistical dimension reduction techniques into the Python universe as a single, consistent package. The dimension reduction methods included resort into three categories: projection…

Computation · Statistics 2020-06-03 Emmanuel Jordy Menvouta , Sven Serneels , Tim Verdonck

Intrinsic Dimension for Large-Scale Geometric Learning

The concept of dimension is essential to grasp the complexity of data. A naive approach to determine the dimension of a dataset is based on the number of attributes. More sophisticated methods derive a notion of intrinsic dimension (ID)…

Machine Learning · Computer Science 2023-04-18 Maximilian Stubbemann , Tom Hanika , Friedrich Martin Schneider

scikit-fda: A Python Package for Functional Data Analysis

The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated…

Computation · Statistics 2024-09-04 Carlos Ramos-Carreño , José Luis Torrecilla , Miguel Carbajo-Berrocal , Pablo Marcos , Alberto Suárez

Rdimtools: An R package for Dimension Reduction and Intrinsic Dimension Estimation

Discovering patterns of the complex high-dimensional data is a long-standing problem. Dimension Reduction (DR) and Intrinsic Dimension Estimation (IDE) are two fundamental thematic programs that facilitate geometric understanding of the…

Machine Learning · Statistics 2022-09-13 Kisung You

Intrinsic dimension estimation for locally undersampled data

High-dimensional data are ubiquitous in contemporary science and finding methods to compress them is one of the primary goals of machine learning. Given a dataset lying in a high-dimensional space (in principle hundreds to several thousands…

Machine Learning · Computer Science 2020-03-24 Vittorio Erba , Marco Gherardi , Pietro Rotondo

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a…

Machine Learning · Statistics 2018-03-20 Elena Facco , Maria d'Errico , Alex Rodriguez , Alessandro Laio

Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since…

Machine Learning · Computer Science 2022-09-30 Laurent Amsaleg , Oussama Chelly , Michael E. Houle , Ken-ichi Kawarabayashi , Miloš Radovanović , Weeris Treeratanajaru

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

This paper introduces scikit-hubness, a Python package for efficient nearest neighbor search in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality, and is known to impair various learning tasks, including…

Machine Learning · Computer Science 2021-01-12 Roman Feldbauer , Thomas Rattei , Arthur Flexer

scikit-image: Image processing in Python

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal "Modified BSD" open source license, provides a well-documented…

Mathematical Software · Computer Science 2014-07-24 Stefan van der Walt , Johannes L. Schönberger , Juan Nunez-Iglesias , François Boulogne , Joshua D. Warner , Neil Yager , Emmanuelle Gouillart , Tony Yu , the scikit-image contributors

spd-metrics-id: A Python Package for SPD-Aware Distance Metrics in Connectome Fingerprinting and Beyond

We present spd-metrics-id, a Python package for computing distances and divergences between symmetric positive-definite (SPD) matrices. Unlike traditional toolkits that focus on specific applications, spd-metrics-id provides a unified,…

Computation · Statistics 2025-10-07 Kaosar Uddin

A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality

Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely…

Machine Learning · Computer Science 2026-03-12 Eng-Jon Ong , Omer Bobrowski , Gesine Reinert , Primoz Skraba

What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly

Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized…

Machine Learning · Computer Science 2025-04-30 Tom Hanika , Tobias Hille

Local intrinsic dimensionality estimators based on concentration of measure

Intrinsic dimensionality (ID) is one of the most fundamental characteristics of multi-dimensional data point clouds. Knowing ID is crucial to choose the appropriate machine learning approach as well as to understand its behavior and…

Machine Learning · Computer Science 2020-04-21 Jonathan Bac , Andrei Zinovyev

PySINDy: A comprehensive Python package for robust sparse system identification

Automated data-driven modeling, the process of directly discovering the governing equations of a system from data, is increasingly being used across the scientific community. PySINDy is a Python package that provides tools for applying the…

Systems and Control · Electrical Eng. & Systems 2022-02-01 Alan A. Kaptanoglu , Brian M. de Silva , Urban Fasel , Kadierdan Kaheman , Andy J. Goldschmidt , Jared L. Callaham , Charles B. Delahunt , Zachary G. Nicolaou , Kathleen Champion , Jean-Christophe Loiseau , J. Nathan Kutz , Steven L. Brunton

Scikit-network: Graph Analysis in Python

Scikit-network is a Python package inspired by scikit-learn for the analysis of large graphs. Graphs are represented by their adjacency matrix in the sparse CSR format of SciPy. The package provides state-of-the-art algorithms for ranking,…

Social and Information Networks · Computer Science 2020-09-17 Thomas Bonald , Nathan de Lara , Quentin Lutz , Bertrand Charpentier