Related papers: Subset Selection for Multiple Linear Regression vi…

A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted…

Machine Learning · Statistics 2020-09-04 Seokhyun Chung , Young Woong Park , Taesu Cheong

Solving the Best Subset Selection Problem via Suboptimal Algorithms

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the…

Machine Learning · Statistics 2025-04-01 Vikram Singh , Min Sun

COMBSS: Best Subset Selection via Continuous Optimization

The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very…

Methodology · Statistics 2023-11-28 Sarat Moka , Benoit Liquet , Houying Zhu , Samuel Muller

Parameter Selection Algorithm For Continuous Variables

In this article, we propose a new algorithm for supervised learning methods, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, an ideal…

Applications · Statistics 2017-01-23 Peyman Tavallali , Marianne Razavi , Sean Brady

Multi-Model Subset Selection

The two primary approaches for high-dimensional regression problems are sparse methods (e.g., best subset selection, which uses the L0-norm in the penalty) and ensemble methods (e.g., random forests). Although sparse methods typically yield…

Methodology · Statistics 2024-10-31 Anthony-Alexander Christidis , Stefan Van Aelst , Ruben Zamar

Optimal subdata selection for linear model selection

If the assumed model does not accurately capture the underlying structure of the data, a statistical method is likely to yield sub-optimal results, and so model selection is crucial in order to conduct any statistical analysis. However, in…

Methodology · Statistics 2023-06-21 Vasilis Chasiotis , Dimitris Karlis

Randomized maximum-contrast selection: subagging for large-scale regression

We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…

Statistics Theory · Mathematics 2019-07-31 Jelena Bradic

On the selection of optimal subdata for big data regression based on leverage scores

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

Better subset regression

To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the…

Methodology · Statistics 2013-03-20 Shifeng Xiong

Optimality and computational barriers in variable selection under dependence

We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…

Statistics Theory · Mathematics 2025-10-07 Ming Gao , Bryon Aragam

Best subset selection, persistence in high-dimensional statistical learning and optimization under $l_1$ constraint

Let $(Y,X_1,...,X_m)$ be a random vector. It is desired to predict $Y$ based on $(X_1,...,X_m)$. Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the…

Statistics Theory · Mathematics 2007-06-13 Eitan Greenshtein

Best subset selection in linear regression via bi-objective mixed integer linear programming

We study the problem of choosing the best subset of p features in linear regression given n observations. This problem naturally contains two objective functions including minimizing the amount of bias and minimizing the number of…

Methodology · Statistics 2018-04-24 Hadi Charkhgard , Ali Eshragh

Efficient Simulation Budget Allocation for Subset Selection Using Regression Metamodels

This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternative designs. Given the total simulation budget constraint, we aim to maximize the probability of correctly…

Optimization and Control · Mathematics 2019-04-25 Fei Gao , Zhongshun Shi , Siyang Gao , Hui Xiao

Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type,…

Machine Learning · Statistics 2023-08-02 Junxian Zhu , Jin Zhu , Borui Tang , Xuanyu Chen , Hongmei Lin , Xueqin Wang

Feature and Variable Selection in Classification

The amount of information in the form of features and variables avail- able to machine learning algorithms is ever increasing. This can lead to classifiers that are prone to overfitting in high dimensions, high di- mensional models do not…

Machine Learning · Computer Science 2014-02-12 Aaron Karper

An Algorithm for Nonlinear, Nonparametric Model Choice and Prediction

We introduce an algorithm which, in the context of nonlinear regression on vector-valued explanatory variables, chooses those combinations of vector components that provide best prediction. The algorithm devotes particular attention to…

Methodology · Statistics 2014-02-03 Frédéric Ferraty , Peter Hall

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Column Selection via Adaptive Sampling

Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection…

Data Structures and Algorithms · Computer Science 2015-10-15 Saurabh Paul , Malik Magdon-Ismail , Petros Drineas