Related papers: Sparse Variable Selection on High Dimensional Hete…
In high-dimensions, many variable selection methods, such as the lasso, are often limited by excessive variability and rank deficiency of the sample covariance matrix. Covariance sparsity is a natural phenomenon in high-dimensional…
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often…
Objective: Social-environmental data obtained from the U.S. Census is an important resource for understanding health disparities, but rarely is the full dataset utilized for analysis. A barrier to incorporating the full data is a lack of…
In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping…
Variable selection for high-dimensional, highly correlated data has long been a challenging problem, often yielding unstable and unreliable models. We propose a resample-aggregate framework that exploits diffusion models' ability to…
This paper studies model selection consistency for high dimensional sparse regression when data exhibits both cross-sectional and serial dependency. Most commonly-used model selection methods fail to consistently recover the true model when…
Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm…
Analysis of high-dimensional data is currently a popular field of research, thanks to many applications e.g. in genetics (DNA data in genomewide association studies), spectrometry or web analysis. At the same time, the type of problems that…
We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results…
This paper is concerned with high-dimensional panel data models where the number of regressors can be much larger than the sample size. Under the assumption that the true parameter vector is sparse we propose a panel-Lasso estimator and…
Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection,…
Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be…
Many high-dimensional data sets suffer from hidden confounding which affects both the predictors and the response of interest. In such situations, standard regression methods or algorithms lead to biased estimates. This paper substantially…
We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…
This paper presents an innovative approach to dimensionality reduction and feature extraction in high-dimensional datasets, with a specific application focus on wood surface defect detection. The proposed framework integrates sparse…
We propose a novel structure selection method for high dimensional (d > 100) sparse vine copulas. Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair…
Decision trees are widely-used classification and regression models because of their interpretability and good accuracy. Classical methods such as CART are based on greedy approaches but a growing attention has recently been devoted to…
This paper investigates the high-dimensional linear regression with highly correlated covariates. In this setup, the traditional sparsity assumption on the regression coefficients often fails to hold, and consequently many model selection…
In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models.…
Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small…