Related papers: Sparse regression and marginal testing using clust…

We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso…

Methodology · Statistics 2015-01-14 Peter Bühlmann , Philipp Rütimann , Sara van de Geer , Cun-Hui Zhang

Inference for feature selection using the Lasso with high-dimensional data

Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These…

Methodology · Statistics 2014-03-19 Kasper Brink-Jensen , Claus Thorn Ekstrøm

Sparse model-based clustering of three-way data via lasso-type penalties

Mixtures of matrix Gaussian distributions provide a probabilistic framework for clustering continuous matrix-variate data, which are becoming increasingly prevalent in various fields. Despite its widespread adoption and successful…

Computation · Statistics 2023-07-21 Andrea Cappozzo , Alessandro Casa , Michael Fop

Lasso under Multi-way Clustering: Estimation and Post-selection Inference

This paper studies high-dimensional regression models with lasso when data is sampled under multi-way clustering. First, we establish convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method…

Econometrics · Economics 2019-08-22 Harold D. Chiang , Yuya Sasaki

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that…

Machine Learning · Computer Science 2020-11-10 Botao Hao , Yaqi Duan , Tor Lattimore , Csaba Szepesvári , Mengdi Wang

Smooth Sparse Coding via Marginal Regression for Learning Sparse Representations

We propose and analyze a novel framework for learning sparse representations, based on two statistical techniques: kernel smoothing and marginal regression. The proposed approach provides a flexible framework for incorporating feature…

Machine Learning · Statistics 2012-10-04 Krishnakumar Balasubramanian , Kai Yu , Guy Lebanon

Fast Learning of Clusters and Topics via Sparse Posteriors

Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs…

Machine Learning · Statistics 2017-11-15 Michael C. Hughes , Erik B. Sudderth

Feature Adaptation for Sparse Linear Regression

Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small…

Data Structures and Algorithms · Computer Science 2023-05-29 Jonathan Kelner , Frederic Koehler , Raghu Meka , Dhruv Rohatgi

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping…

Machine Learning · Statistics 2016-03-14 Niharika Gauraha , Swapan K. Parui

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…

Machine Learning · Statistics 2008-03-26 Benhuai Xie , Wei Pan , Xiaotong Shen

Reclustering: A New Method to Test the Appropriate Level of Clustering

When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…

Methodology · Statistics 2025-11-12 Kentaro Fukumoto

Simple and Scalable Sparse k-means Clustering via Feature Ranking

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…

Machine Learning · Statistics 2020-10-23 Zhiyue Zhang , Kenneth Lange , Jason Xu

Clustering and Feature Selection using Sparse Principal Component Analysis

In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of…

Artificial Intelligence · Computer Science 2008-10-08 Ronny Luss , Alexandre d'Aspremont

Sparse Multivariate Linear Regression with Strongly Associated Response Variables

We propose new methods for multivariate linear regression when the regression coefficient matrix is sparse and the error covariance matrix is dense. We assume that the error covariance matrix has equicorrelation across the response…

Methodology · Statistics 2025-08-13 Daeyoung Ham , Bradley S. Price , Adam J. Rothman

Post-Selection Inference for Sparse Estimation

When the model is not known and parameter testing or interval estimation is conducted after model selection, it is necessary to consider selective inference. This paper discusses this issue in the context of sparse estimation. Firstly, we…

Methodology · Statistics 2023-10-12 Joe Suzuki

Inference in High Dimensions with the Penalized Score Test

In recent years, there has been considerable theoretical development regarding variable selection consistency of penalized regression techniques, such as the lasso. However, there has been relatively little work on quantifying the…

Methodology · Statistics 2014-05-21 Arend Voorman , Ali Shojaie , Daniela Witten

Sparse Regression: Scalable algorithms and empirical performance

In this paper, we review state-of-the-art methods for feature selection in statistics with an application-oriented eye. Indeed, sparsity is a valuable property and the profusion of research on the topic might have provided little guidance…

Methodology · Statistics 2021-11-08 Dimitris Bertsimas , Jean Pauphilet , Bart Van Parys

Sparse $\ell_1$ and $\ell_2$ Center Classifiers

The nearest-centroid classifier is a simple linear-time classifier based on computing the centroids of the data classes in the training phase, and then assigning a new datum to the class corresponding to its nearest centroid. Thanks to its…

Machine Learning · Computer Science 2019-11-26 Giuseppe C. Calafiore , Giulia Fracastoro

Selective Inference for Group-Sparse Linear Models

We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of…

Methodology · Statistics 2016-07-28 Fan Yang , Rina Foygel Barber , Prateek Jain , John Lafferty

Sparse Classification: a scalable discrete optimization perspective

We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm…

Optimization and Control · Mathematics 2025-01-08 Dimitris Bertsimas , Jean Pauphilet , Bart Van Parys