Related papers: Model selection in logistic regression

Non-asymptotic model selection for linear non least-squares estimation in regression models and inverse problems

We propose to address the common problem of linear estimation in linear statistical models by using a model selection approach via penalization. Depending then on the framework in which the linear statistical model is considered namely the…

Statistics Theory · Mathematics 2009-09-11 Ikhlef Bechar

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Regression Model Selection Under General Conditions

Model selection criteria are one of the most important tools in statistics. Proofs showing a model selection criterion is asymptotically optimal are tailored to the type of model (linear regression, quantile regression, penalized…

Statistics Theory · Mathematics 2025-10-17 Amaze Lusompa

Minimal penalties and the slope heuristics: a survey

Birg{\'e} and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since…

Statistics Theory · Mathematics 2019-10-28 Sylvain Arlot

Model Selection with the Loss Rank Principle

A key issue in statistics and machine learning is to automatically select the "right" model complexity, e.g., the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with…

Machine Learning · Computer Science 2010-10-04 Marcus Hutter , Minh-Ngoc Tran

Clustering and variable selection for categorical multivariate data

This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data.…

Statistics Theory · Mathematics 2014-03-11 Dominique Bontemps , Wilson Toussile

Maximum Likelihood for Logistic Regression Model with Incomplete and Hybrid-Type Covariates

Logistic regression is a fundamental and widely used statistical method for modeling binary outcomes based on covariates. However, the presence of missing data, particularly in settings involving hybrid covariates (a mix of discrete and…

Methodology · Statistics 2025-06-05 Mohamed Cherifi , Xujia Zhu , Mohammed Nabil El Korso , Ammar Mesloub

Finite-sample performance of the maximum likelihood estimator in logistic regression

Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression,…

Statistics Theory · Mathematics 2026-02-20 Hugo Chardon , Matthieu Lerasle , Jaouad Mourtada

Gaussian Mixture Regression model with logistic weights, a penalized maximum likelihood approach

We wish to estimate conditional density using Gaussian Mixture Regression model with logistic weights and means depending on the covariate. We aim at selecting the number of components of this model as well as the other parameters by a…

Statistics Theory · Mathematics 2013-04-10 Lucie Montuelle , Erwan Le Pennec , Serge Cohen

MAP model selection in Gaussian regression

We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized…

Statistics Theory · Mathematics 2010-09-14 Felix Abramovich , Vadim Grinshtein

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…

Statistics Theory · Mathematics 2018-11-20 Felix Abramovich , Vadim Grinshtein

Optimal model selection in density estimation

We build penalized least-squares estimators using the slope heuristic and resampling penalties. We prove oracle inequalities for the selected estimator with leading constant asymptotically equal to 1. We compare the practical performances…

Statistics Theory · Mathematics 2015-03-13 Matthieu Lerasle

Model selection and minimax estimation in generalized linear models

We consider model selection in generalized linear models (GLM) for high-dimensional data and propose a wide class of model selection criteria based on penalized maximum likelihood with a complexity penalty on the model size. We derive a…

Statistics Theory · Mathematics 2016-03-31 Felix Abramovich , Vadim Grinshtein

Estimating the logistic regression equation when the model is incorrect

Protesting mildly against the notion of an exactly correct parametric model the view is adopted that the logistic regression equation is merely an approximation to the underlying, true function. The behaviour of likelihood based estimators…

Statistics Theory · Mathematics 2026-05-27 Nils Lid Hjort

Model selection for estimation of causal parameters

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal inference, the optimal choice of estimator…

Methodology · Statistics 2021-07-07 Dominik Rothenhäusler

Law of the Iterated Logarithm and Model Selection Consistency for GLMs with Independent and Dependent Responses

We study the law of the iterated logarithm (LIL) for the maximum likelihood estimation of the parameters (as a convex optimization problem) in the generalized linear models with independent or weakly dependent ($\rho$-mixing, $m$-dependent)…

Statistics Theory · Mathematics 2020-04-28 Xiaowei Yang , Shuang Song , Huiming Zhang

A modern maximum-likelihood theory for high-dimensional logistic regression

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there…

Statistics Theory · Mathematics 2022-06-08 Pragya Sur , Emmanuel J. Candes

Model selection by resampling penalization

We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local…

Statistics Theory · Mathematics 2007-06-13 Sylvain Arlot

Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms

Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor variables in data sets. It is often used in machine learning to identify important predictor variables.…

Optimization and Control · Mathematics 2021-12-30 Jérôme Darbon , Gabriel P. Langlois

The Slope Heuristics in Heteroscedastic Regression

We consider the estimation of a regression function with random design and heteroscedastic noise in a nonparametric setting. More precisely, we address the problem of characterizing the optimal penalty when the regression function is…

Statistics Theory · Mathematics 2015-06-29 Adrien Saumard