Related papers: Comparing methods addressing multi-collinearity wh…
While data science is battling to extract information from the enormous explosion of data, many estimators and algorithms are being developed for better prediction. Researchers and data scientists often introduce new methods and evaluate…
Linear model prediction with a large number of potential predictors is both statistically and computationally challenging. The traditional approaches are largely based on shrinkage selection/estimation methods, which are applicable even…
Collinearity and near-collinearity of predictors cause difficulties when doing regression. In these cases, variable selection becomes untenable because of mathematical issues concerning the existence and numerical stability of the…
Prediction performance does not always reflect the estimation behaviour of a method. High error in estimation may necessarily not result in high prediction error, but can lead to an unreliable prediction if test data lie in a slightly…
Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "$n>p$" data are not…
When developing risk prediction models, shrinkage methods are recommended, especially when the sample size is limited. Several earlier studies have shown that the shrinkage of model coefficients can reduce overfitting of the prediction…
We address the problem of multicollinearity in a function-on-scalar regression model by using a prior which simultaneously selects, clusters, and smooths functional effects. Our methodology groups effects of highly correlated predictors,…
The paper analyzes how the enlarging of the sample affects to the mitigation of collinearity concluding that it may mitigate the consequences of collinearity related to statistical analysis but not necessarily the numerical instability. The…
Linear mixed effects models are highly flexible in handling a broad range of data types and are therefore widely used in applications. A key part in the analysis of data is model selection, which often aims to choose a parsimonious model…
This methodological note investigates and discuss possible selection and collider restriction bias due to predictor availability in prognostic models.
While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces…
As predictive models -- e.g., from machine learning -- give likely outcomes, they may be used to reason on the effect of an intervention, a causal-inference task. The increasing complexity of health data has opened the door to a plethora of…
The development and use of dimension reduction methods is prevalent in modern statistical literature. This paper reviews a class of dimension reduction techniques which aim to simultaneously select relevant predictors and find clusters…
Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given…
In regression analysis, associations between continuous predictors and the outcome are often assumed to be linear. However, modeling the associations as non-linear can improve model fit. Many flexible modeling techniques, like (fractional)…
Confounding matters in almost all observational studies that focus on causality. In order to eliminate bias caused by connfounders, oftentimes a substantial number of features need to be collected in the analysis. In this case, large p…
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of…
Forecasting with longitudinal data has been rarely studied. Most of the available studies are for continuous response and all of them are for univariate response. In this study, we consider forecasting multivariate longitudinal binary data.…
The problem known as multicolinearity has long been recognized to fundamentally and negatively influence multiple regression. This paper does not intend to either propose a numerical assessment of the degree to which this problem exists…
In this paper we have tried to compare the various face recognition models against their classical problems. We look at the methods followed by these approaches and evaluate to what extent they are able to solve the problems. All methods…