Related papers: SuRF: a New Method for Sparse Variable Selection, …
Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands. Recently, there has been growing attention on feature…
Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such…
This paper proposes a sparse regression method that continuously interpolates between Forward Stepwise selection (FS) and the LASSO. When tuned appropriately, our solutions are much sparser than typical LASSO fits but, unlike FS fits,…
Compositional data, where only relative abundances are available, are common in microbiome and other high-throughput sequencing studies. Log ratios between groups of variables serve as key biomarkers in these settings. However, selecting…
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "pre-conditioned" response…
Sparse Representation (SR) techniques encode the test samples into a sparse linear combination of all training samples and then classify the test samples into the class with the minimum residual. The classification of SR techniques depends…
Methods for global measurement of transcript abundance such as microarrays and RNA-Seq generate datasets in which the number of measured features far exceeds the number of observations. Extracting biologically meaningful and experimentally…
In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional…
High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While established for fully supervised data, these techniques…
Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this…
Sparsity-inducing penalties are useful tools for variable selection and they are also effective for regression settings where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in…
The sparse-group lasso performs both variable and group selection, simultaneously using the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional…
Varying coefficient models have numerous applications in a wide scope of scientific areas. While enjoying nice interpretability, they also allow flexibility in modeling dynamic impacts of the covariates. But, in the new era of big data, it…
We propose a ranking and selection procedure to prioritize relevant predictors and control false discovery proportion (FDP) of variable selection. Our procedure utilizes a new ranking method built upon the de-sparsified Lasso estimator. We…
In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be…
We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of…
Sparse recovery and subset selection are fundamental problems in varied communities, including signal processing, statistics and machine learning. Herein, we focus on an important greedy algorithm for these problems: Backward Stepwise…
This manuscript presents the following: (1) an improved version of the Binary Simultaneous Perturbation Stochastic Approximation (SPSA) Method for feature selection in machine learning (Aksakalli and Malekipirbazari, Pattern Recognition…
Feature selection from a large number of covariates (aka features) in a regression analysis remains a challenge in data science, especially in terms of its potential of scaling to ever-enlarging data and finding a group of scientifically…
An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong…