Related papers: Subset selection in sparse matrices
In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type,…
We address the subset selection problem for matrices, where the goal is to select a subset of $k$ columns from a "short-and-fat" matrix $X \in \mathbb{R}^{m \times n}$, such that the pseudoinverse of the sampled submatrix has as small…
We study the problem of optimal subset selection from a set of correlated random variables. In particular, we consider the associated combinatorial optimization problem of maximizing the determinant of a symmetric positive definite matrix…
Reducing the conditions under which a given set satisfies the stipulations of the subset sum proposition to a set of linear relationships, the question of whether a set satisfies subset sum may be answered in a polynomial number of steps by…
Under a standard assumption in complexity theory (NP not in P/poly), we demonstrate a gap between the minimax prediction risk for sparse linear regression that can be achieved by polynomial-time algorithms, and that achieved by optimal…
Variable selection for sparse linear regression is the problem of finding, given an m x p matrix B and a target vector y, a sparse vector x such that Bx approximately equals y. Assuming a standard complexity hypothesis, we show that no…
We consider the problem of finding a sparse multiple of a polynomial. Given f in F[x] of degree d over a field F, and a desired sparsity t, our goal is to determine if there exists a multiple h in F[x] of f such that h has at most t…
Subset selection for matrices is the task of extracting a column sub-matrix from a given matrix $B\in\mathbb{R}^{n\times m}$ with $m>n$ such that the pseudoinverse of the sampled matrix has as small Frobenius or spectral norm as possible.…
We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…
Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the…
The main purpose of this paper is to study the NP-complete subset-sum problem, not in the usual context of time-complexity-based classification of the algorithms (exponential/polynomial), but through a new kind of algorithmic classification…
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…
Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm…
Sparse principal component analysis addresses the problem of finding a linear combination of the variables in a given data set with a sparse coefficients vector that maximizes the variability of the data. This model enhances the ability to…
The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero…
We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of…
Let $M$ be a real $r\times c$ matrix and let $k$ be a positive integer. In the column subset selection problem (CSSP), we need to minimize the quantity $\|M-SA\|$, where $A$ can be an arbitrary $k\times c$ matrix, and $S$ runs over all…
In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…
The computation of the sparse principal component of a matrix is equivalent to the identification of its principal submatrix with the largest maximum eigenvalue. Finding this optimal submatrix is what renders the problem…
As a typical dimensionality reduction technique, random projection can be simply implemented with linear projection, while maintaining the pairwise distances of high-dimensional data with high probability. Considering this technique is…