Related papers: Subset selection in sparse matrices

Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type,…

Machine Learning · Statistics 2023-08-02 Junxian Zhu , Jin Zhu , Borui Tang , Xuanyu Chen , Hongmei Lin , Xueqin Wang

Subset selection for matrices in spectral norm

We address the subset selection problem for matrices, where the goal is to select a subset of $k$ columns from a "short-and-fat" matrix $X \in \mathbb{R}^{m \times n}$, such that the pseudoinverse of the sampled submatrix has as small…

Numerical Analysis · Mathematics 2025-07-29 Ivan Kozyrev , Alexander Osinsky

Approximately Optimal Subset Selection for Statistical Design and Modelling

We study the problem of optimal subset selection from a set of correlated random variables. In particular, we consider the associated combinatorial optimization problem of maximizing the determinant of a symmetric positive definite matrix…

Computation · Statistics 2019-07-12 Yu Wang , Nhu D. Le , James V. Zidek

Polynomial-time Method of Determining Subset Sum Solutions

Reducing the conditions under which a given set satisfies the stipulations of the subset sum proposition to a set of linear relationships, the question of whether a set satisfies subset sum may be answered in a polynomial number of steps by…

Data Structures and Algorithms · Computer Science 2017-05-16 Aubrey Alston

Lower bounds on the performance of polynomial-time algorithms for sparse linear regression

Under a standard assumption in complexity theory (NP not in P/poly), we demonstrate a gap between the minimax prediction risk for sparse linear regression that can be achieved by polynomial-time algorithms, and that achieved by optimal…

Statistics Theory · Mathematics 2014-05-22 Yuchen Zhang , Martin J. Wainwright , Michael I. Jordan

Variable Selection is Hard

Variable selection for sparse linear regression is the problem of finding, given an m x p matrix B and a target vector y, a sparse vector x such that Bx approximately equals y. Assuming a standard complexity hypothesis, we show that no…

Computational Complexity · Computer Science 2014-12-17 Dean Foster , Howard Karloff , Justin Thaler

Computing sparse multiples of polynomials

We consider the problem of finding a sparse multiple of a polynomial. Given f in F[x] of degree d over a field F, and a desired sparsity t, our goal is to determine if there exists a multiple h in F[x] of f such that h has at most t…

Symbolic Computation · Computer Science 2011-01-04 Mark Giesbrecht , Daniel S. Roche , Hrushikesh Tilak

Subset Selection for Matrices with Fixed Blocks

Subset selection for matrices is the task of extracting a column sub-matrix from a given matrix $B\in\mathbb{R}^{n\times m}$ with $m>n$ such that the pseudoinverse of the sampled matrix has as small Frobenius or spectral norm as possible.…

Data Structures and Algorithms · Computer Science 2020-03-04 Jiaxin Xie , Zhiqiang Xu

Optimality and computational barriers in variable selection under dependence

We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…

Statistics Theory · Mathematics 2025-10-07 Ming Gao , Bryon Aragam

Solving the Best Subset Selection Problem via Suboptimal Algorithms

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the…

Machine Learning · Statistics 2025-04-01 Vikram Singh , Min Sun

The problematic nature of potentially polynomial-time algorithms solving the subset-sum problem

The main purpose of this paper is to study the NP-complete subset-sum problem, not in the usual context of time-complexity-based classification of the algorithms (exponential/polynomial), but through a new kind of algorithmic classification…

Computational Complexity · Computer Science 2018-11-20 Antonios Syreloglou

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

Improving Group Lasso for high-dimensional categorical data

Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm…

Methodology · Statistics 2022-11-14 Szymon Nowakowski , Piotr Pokarowski , Wojciech Rejchel , Agnieszka Sołtys

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Sparse principal component analysis addresses the problem of finding a linear combination of the variables in a given data set with a sparse coefficients vector that maximizes the variability of the data. This model enhances the ability to…

Optimization and Control · Mathematics 2017-03-09 Amir Beck , Yakov Vaisbourd

Spurious Valleys, NP-hardness, and Tractability of Sparse Matrix Factorization With Fixed Support

The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero…

Computational Complexity · Computer Science 2022-11-23 Quoc-Tung Le , Elisa Riccietti , Rémi Gribonval

Faster Subset Selection for Matrices and Applications

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of…

Data Structures and Algorithms · Computer Science 2013-06-25 Haim Avron , Christos Boutsidis

Column subset selection is NP-complete

Let $M$ be a real $r\times c$ matrix and let $k$ be a positive integer. In the column subset selection problem (CSSP), we need to minimize the quantity $\|M-SA\|$, where $A$ can be an arbitrary $k\times c$ matrix, and $S$ runs over all…

Combinatorics · Mathematics 2017-01-12 Yaroslav Shitov

Sparse Learning for Variable Selection with Structures and Nonlinearities

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova

The Sparse Principal Component of a Constant-rank Matrix

The computation of the sparse principal component of a matrix is equivalent to the identification of its principal submatrix with the largest maximum eigenvalue. Finding this optimal submatrix is what renders the problem…

Information Theory · Computer Science 2013-12-23 Megasthenis Asteris , Dimitris S. Papailiopoulos , George N. Karystinos

Sparse Matrix-based Random Projection for Classification

As a typical dimensionality reduction technique, random projection can be simply implemented with linear projection, while maintaining the pairwise distances of high-dimensional data with high probability. Considering this technique is…

Machine Learning · Computer Science 2014-10-14 Weizhi Lu , Weiyu Li , Kidiyo Kpalma , Joseph Ronsin