Related papers: High-dimensional regression and variable selection…
Background: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that need to be taken into account.…
Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and…
The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly…
Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…
In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models.…
We consider covariate adjusted regression (CAR), a regression method for situations where predictors and response are observed after being distorted by a multiplicative factor. The distorting factors are unknown functions of an observable…
We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results…
In a standard regression problem, we have a set of explanatory variables whose effect on some response vector is modeled. For wide binary data, such as genetic marker data, we often have two limitations. First, we have more parameters than…
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression…
We propose a vector auto-regressive (VAR) model with a low-rank constraint on the transition matrix. This new model is well suited to predict high-dimensional series that are highly correlated, or that are driven by a small number of hidden…
This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into classification of High Dimension Low Sample Size (HDLSS) data, in particular microarray data, has made considerable…
Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this…
Compositional data, where only relative abundances are available, are common in microbiome and other high-throughput sequencing studies. Log ratios between groups of variables serve as key biomarkers in these settings. However, selecting…
High-dimensional, low sample-size (HDLSS) data problems have been a topic of immense importance for the last couple of decades. There is a vast literature that proposed a wide variety of approaches to deal with this situation, among which…
We introduce varbvs, a suite of functions written in R and MATLAB for regression analysis of large-scale data sets using Bayesian variable selection methods. We have developed numerical optimization algorithms based on variational…
Variable selection for optimal treatment regime in a clinical trial or an observational study is getting more attention. Most existing variable selection techniques focused on selecting variables that are important for prediction, therefore…
We develop estimation for potentially high-dimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding…
This paper explores the homogeneity of coefficients in high-dimensional regression, which extends the sparsity concept and is more general and suitable for many applications. Homogeneity arises when one expects regression coefficients…
Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms…
In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the…