Related papers: Minimum Sample Size for Developing a Multivariable…
Background: Clinical prediction models are increasingly used to inform healthcare decisions, but determining the minimum sample size for their development remains a critical and unresolved challenge. Inadequate sample sizes can lead to…
Over the last few decades, prediction models have become a fundamental tool in statistics, chemometrics, and related fields. However, to ensure that such models have high value, the inferences that they generate must be reliable. In this…
When developing a clinical prediction model, the sample size of the development dataset is a key consideration. Small sample sizes lead to greater concerns of overfitting, instability, poor performance and lack of fairness. Previous…
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of…
Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and…
Bayesian multinomial logistic regression provides a principled, interpretable approach to multiclass classification, but posterior sampling becomes increasingly expensive as the model dimension grows. Prior work has studied scalability in…
Objective: Provide guidance on sample size considerations for developing predictive models by empirically establishing the adequate sample size, which balances the competing objectives of improving model performance and reducing model…
In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…
Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression,…
When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely…
Contemporary sample size calculations for external validation of risk prediction models require users to specify fixed values of assumed model performance metrics alongside target precision levels (e.g., 95% CI widths). However, due to the…
We consider a variable selection problem for the prediction of binary outcomes. We study the best subset selection procedure by which the covariates are chosen by maximizing Manski (1975, 1985)'s maximum score objective function subject to…
Let $(Y,X_1,...,X_m)$ be a random vector. It is desired to predict $Y$ based on $(X_1,...,X_m)$. Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the…
Binomial data with unknown sizes often appear in biological and medical sciences and are usually overdispersed. All previous methods used parametric models and only considered overdispersion due to the variation of sizes. The proposed…
In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal…
This paper applies the minimum message length principle to inference of linear regression models with Student-t errors. A new criterion for variable selection and parameter estimation in Student-t regression is proposed. By exploiting…
We investigate a robust penalized logistic regression algorithm based on a minimum distance criterion. Influential outliers are often associated with the explosion of parameter vector estimates, but in the context of standard logistic…
Generalized linear models, such as logistic regression, are widely used to model the association between a treatment and a binary outcome as a function of baseline covariates. However, the coefficients of a logistic regression model…
Although the log-likelihood is widely used in model selection, the log-likelihood ratio has had few applications in this area. We develop a log-likelihood ratio based method for selecting regression models by focusing on the set of models…
Logistic linear mixed model is widely used in experimental designs and genetic analysis with binary traits. Motivated by modern applications, we consider the case with many groups of random effects and each group corresponds to a variance…