Related papers: Supervised Dimensionality Reduction for Big Data

Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures

Randomized dimensionality reduction is a widely-used algorithmic technique for speeding up large-scale Euclidean optimization problems. In this paper, we study dimension reduction for a variety of maximization problems, including…

Data Structures and Algorithms · Computer Science 2025-06-03 Jie Gao , Rajesh Jayaram , Benedikt Kolbe , Shay Sapir , Chris Schwiegelshohn , Sandeep Silwal , Erik Waingarten

Supervised Visualization for Data Exploration

Dimensionality reduction is often used as an initial step in data exploration, either as preprocessing for classification or regression or for visualization. Most dimensionality reduction techniques to date are unsupervised; they do not…

Machine Learning · Statistics 2020-06-17 Jake S. Rhodes , Adele Cutler , Guy Wolf , Kevin R. Moon

Learning Out of Leaders

This paper investigates the estimation problem in a regression-type model. To be able to deal with potential high dimensions, we provide a procedure called LOL, for Learning Out of Leaders with no optimization step. LOL is an auto-driven…

Statistics Theory · Mathematics 2011-01-24 Mathilde Mougeot , Dominique Picard , Karine Tribouley

Linear Dimensionality Reduction: Survey, Insights, and Generalizations

Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of…

Machine Learning · Statistics 2016-03-22 John P. Cunningham , Zoubin Ghahramani

Dimension Reduction via Supervised Clustering of Regression Coefficients: A Review

The development and use of dimension reduction methods is prevalent in modern statistical literature. This paper reviews a class of dimension reduction techniques which aim to simultaneously select relevant predictors and find clusters…

Methodology · Statistics 2022-02-18 Suchit Mehrotra

Local Explanation of Dimensionality Reduction

Dimensionality reduction (DR) is a popular method for preparing and analyzing high-dimensional data. Reduced data representations are less computationally intensive and easier to manage and visualize, while retaining a significant…

Machine Learning · Computer Science 2022-05-02 Avraam Bardos , Ioannis Mollas , Nick Bassiliades , Grigorios Tsoumakas

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing

Learning to generalise from limited data is a fundamental challenge for both artificial and biological systems. A common strategy is to extract reusable structure from abundant unlabelled data, enabling efficient adaptation to new tasks…

Machine Learning · Computer Science 2026-05-20 Valentina Njaradi , Clémentine Dominé , Rachel Swanson , Marco Mondelli , Andrew Saxe

Deep Dimension Reduction for Supervised Representation Learning

The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low…

Machine Learning · Computer Science 2022-09-02 Jian Huang , Yuling Jiao , Xu Liao , Jin Liu , Zhou Yu

Large Scale High-Dimensional Reduced-Rank Linear Discriminant Analysis

Reduced-rank linear discriminant analysis (RRLDA) is a foundational method of dimension reduction for classification that has been useful in a wide range of applications. The goal is to identify an optimal subspace to project the…

Computation · Statistics 2026-02-12 Jocelyn T. Chi

Optimal Discriminant Analysis in High-Dimensional Latent Factor Models

In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this…

Statistics Theory · Mathematics 2025-08-05 Xin Bing , Marten Wegkamp

We Still Don't Understand High-Dimensional Bayesian Optimization

Existing high-dimensional Bayesian optimization (BO) methods aim to overcome the curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly,…

Machine Learning · Computer Science 2026-04-10 Colin Doumont , Donney Fan , Natalie Maus , Jacob R. Gardner , Henry Moss , Geoff Pleiss

Statistical Quantile Learning for Large, Nonlinear, and Additive Latent Variable Models

The studies of large-scale, high-dimensional data in fields such as genomics and neuroscience have injected new insights into science. Yet, despite advances, they are confronting several challenges, often simultaneously: lack of…

Methodology · Statistics 2024-01-01 Julien Bodelet , Guillaume Blanc , Jiajun Shan , Graciela Muniz Terrera , Oliver Y. Chen

Linear combinations of latents in generative models: subspaces and beyond

Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalising Flows have shown effectiveness across various modalities, and rely on…

Machine Learning · Statistics 2025-11-10 Erik Bodin , Alexandru Stere , Dragos D. Margineantu , Carl Henrik Ek , Henry Moss

X-model: Improving Data Efficiency in Deep Learning with A Minimax Model

To mitigate the burden of data labeling, we aim at improving data efficiency for both classification and regression setups in deep learning. However, the current focus is on classification problems while rare attention has been paid to deep…

Machine Learning · Computer Science 2021-10-12 Ximei Wang , Xinyang Chen , Jianmin Wang , Mingsheng Long

Dynamic Partial Sufficient Dimension Reduction

Sufficient dimension reduction aims for reduction of dimensionality of a regression without loss of information by replacing the original predictor with its lower-dimensional subspace. Partial (sufficient) dimension reduction arises when…

Methodology · Statistics 2019-09-27 Lu Li , Kai Tan , Xuerong Meggie Wen , Zhou Yu

Dimensionality Reduction via Diffusion Map Improved with Supervised Linear Projection

When performing classification tasks, raw high dimensional features often contain redundant information, and lead to increased computational complexity and overfitting. In this paper, we assume the data samples lie on a single underlying…

Image and Video Processing · Electrical Eng. & Systems 2020-08-11 Bowen Jiang , Maohao Shen

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

Large-Scale Unsupervised Object Discovery

Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of…

Computer Vision and Pattern Recognition · Computer Science 2021-11-18 Huy V. Vo , Elena Sizikova , Cordelia Schmid , Patrick Pérez , Jean Ponce

Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look

Frozen pretrained image representations are widely used for transfer learning: a backbone is kept fixed, feature vectors are extracted, and a lightweight classifier is trained on top. This pipeline usually feeds the full feature vector to…

Machine Learning · Computer Science 2026-05-12 Indar Kumar , Girish Karhana , Sai Krishna Jasti , Ankit Hemant Lade

Adaptive Randomized Dimension Reduction on Massive Data

The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In…

Machine Learning · Statistics 2015-04-14 Gregory Darnell , Stoyan Georgiev , Sayan Mukherjee , Barbara E Engelhardt