Related papers: Supervised Dimensionality Reduction for Big Data
Randomized dimensionality reduction is a widely-used algorithmic technique for speeding up large-scale Euclidean optimization problems. In this paper, we study dimension reduction for a variety of maximization problems, including…
Dimensionality reduction is often used as an initial step in data exploration, either as preprocessing for classification or regression or for visualization. Most dimensionality reduction techniques to date are unsupervised; they do not…
This paper investigates the estimation problem in a regression-type model. To be able to deal with potential high dimensions, we provide a procedure called LOL, for Learning Out of Leaders with no optimization step. LOL is an auto-driven…
Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of…
The development and use of dimension reduction methods is prevalent in modern statistical literature. This paper reviews a class of dimension reduction techniques which aim to simultaneously select relevant predictors and find clusters…
Dimensionality reduction (DR) is a popular method for preparing and analyzing high-dimensional data. Reduced data representations are less computationally intensive and easier to manage and visualize, while retaining a significant…
Learning to generalise from limited data is a fundamental challenge for both artificial and biological systems. A common strategy is to extract reusable structure from abundant unlabelled data, enabling efficient adaptation to new tasks…
The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low…
Reduced-rank linear discriminant analysis (RRLDA) is a foundational method of dimension reduction for classification that has been useful in a wide range of applications. The goal is to identify an optimal subspace to project the…
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this…
Existing high-dimensional Bayesian optimization (BO) methods aim to overcome the curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly,…
The studies of large-scale, high-dimensional data in fields such as genomics and neuroscience have injected new insights into science. Yet, despite advances, they are confronting several challenges, often simultaneously: lack of…
Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalising Flows have shown effectiveness across various modalities, and rely on…
To mitigate the burden of data labeling, we aim at improving data efficiency for both classification and regression setups in deep learning. However, the current focus is on classification problems while rare attention has been paid to deep…
Sufficient dimension reduction aims for reduction of dimensionality of a regression without loss of information by replacing the original predictor with its lower-dimensional subspace. Partial (sufficient) dimension reduction arises when…
When performing classification tasks, raw high dimensional features often contain redundant information, and lead to increased computational complexity and overfitting. In this paper, we assume the data samples lie on a single underlying…
The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…
Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of…
Frozen pretrained image representations are widely used for transfer learning: a backbone is kept fixed, feature vectors are extracted, and a lightweight classifier is trained on top. This pipeline usually feeds the full feature vector to…
The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In…