Related papers: Robust Variable Selection for High-dimensional Reg…
Additive models belong to the class of structured nonparametric regression models that do not suffer from the curse of dimensionality. Finding the additive components that are nonzero when the true model is assumed to be sparse is an…
We present a robust framework to perform linear regression with missing entries in the features. By considering an elliptical data distribution, and specifically a multivariate normal model, we are able to conditionally formulate a…
We propose a robust variable selection procedure using a divergence based M-estimator combined with a penalty function. It produces robust estimates of the regression parameters and simultaneously selects the important explanatory…
This paper proposes an adaptive penalized weighted mean regression for outlier detection of high-dimensional data. In comparison to existing approaches based on the mean shift model, the proposed estimators demonstrate robustness against…
Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…
When outcomes are missing for reasons beyond an investigator's control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the…
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely…
This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional.…
The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed.…
Doubly truncated data arise in many areas such as astronomy, econometrics, and medical studies. For the regression analysis with doubly truncated response variables, the existence of double truncation may bring bias for estimation as well…
The standard quantile regression model assumes a linear relationship at the quantile of interest and that all variables are observed. We relax these assumptions by considering a partial linear model while allowing for missing linear…
Missing outcome data is one of the principal threats to the validity of treatment effect estimates from randomized trials. The outcome distributions of participants with missing and observed data are often different, which increases the…
High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square…
In a missing-data setting, we have a sample in which a vector of explanatory variables x_i is observed for every subject i, while scalar outcomes y_i are missing by happenstance on some individuals. In this work we propose robust estimates…
Missing data is frequently encountered in many areas of statistics. Propensity score weighting is a popular method for handling missing data. The propensity score method employs a response propensity model, but correct specification of the…
Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression…
This paper considers robust modeling of the survival time for cancer patients. Accurate prediction can be helpful for developing therapeutic and care strategies. We propose a unified Expectation-Maximization approach combined with the…
Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the…
Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require…
The missing data issue is ubiquitous in health studies. Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic but has been less studied. Existing literature focuses on…