Related papers: COMBSS: Best Subset Selection via Continuous Optim…

Solving the Best Subset Selection Problem via Suboptimal Algorithms

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the…

Machine Learning · Statistics 2025-04-01 Vikram Singh , Min Sun

Group COMBSS: Group Selection via Continuous Optimization

We present a new optimization method for the group selection problem in linear regression. In this problem, predictors are assumed to have a natural group structure and the goal is to select a small set of groups that best fits the…

Methodology · Statistics 2024-04-23 Anant Mathur , Sarat Moka , Benoit Liquet , Zdravko Botev

Parameter Selection Algorithm For Continuous Variables

In this article, we propose a new algorithm for supervised learning methods, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, an ideal…

Applications · Statistics 2017-01-23 Peyman Tavallali , Marianne Razavi , Sean Brady

Continuous Optimization for Offline Change Point Detection and Estimation

This work explores use of novel advances in best subset selection for regression modelling via continuous optimization for offline change point detection and estimation in univariate Gaussian data sequences. The approach exploits…

Methodology · Statistics 2024-07-08 Hans Reimann , Sarat Moka , Georgy Sofronov

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is…

Methodology · Statistics 2019-06-27 HaiYing Wang , Min Yang , John Stufken

Best subset selection in linear regression via bi-objective mixed integer linear programming

We study the problem of choosing the best subset of p features in linear regression given n observations. This problem naturally contains two objective functions including minimizing the amount of bias and minimizing the number of…

Methodology · Statistics 2018-04-24 Hadi Charkhgard , Ali Eshragh

Best Subset Solution Path for Linear Dimension Reduction Models using Continuous Optimization

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper,…

Methodology · Statistics 2024-04-01 Benoit Liquet , Sarat Moka , Samuel Muller

Probabilistic Best Subset Selection via Gradient-Based Optimization

In high-dimensional statistics, variable selection recovers the latent sparse patterns from all possible covariate combinations. This paper proposes a novel optimization method to solve the exact L0-regularized regression problem, which is…

Methodology · Statistics 2022-06-02 Mingzhang Yin , Nhat Ho , Bowei Yan , Xiaoning Qian , Mingyuan Zhou

On best subset regression

In this paper we discuss the variable selection method from \ell0-norm constrained regression, which is equivalent to the problem of finding the best subset of a fixed size. Our study focuses on two aspects, consistency and computation. We…

Methodology · Statistics 2013-03-20 Shifeng Xiong

Robust subset selection

The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its…

Methodology · Statistics 2022-01-11 Ryan Thompson

Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…

Machine Learning · Computer Science 2023-09-18 Meng Xiao , Dongjie Wang , Min Wu , Pengfei Wang , Yuanchun Zhou , Yanjie Fu

BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing…

Machine Learning · Computer Science 2024-06-06 Hoyong Choi , Nohyun Ki , Hye Won Chung

Best Subset Selection via a Modern Optimization Lens

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We…

Methodology · Statistics 2015-07-14 Dimitris Bertsimas , Angela King , Rahul Mazumder

D-optimal Subsampling Design for Massive Data Linear Regression

Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider multiple linear regression for an extraordinarily large number of…

Methodology · Statistics 2025-05-30 Torsten Glemser , Rainer Schwabe

Approximately Optimal Subset Selection for Statistical Design and Modelling

We study the problem of optimal subset selection from a set of correlated random variables. In particular, we consider the associated combinatorial optimization problem of maximizing the determinant of a symmetric positive definite matrix…

Computation · Statistics 2019-07-12 Yu Wang , Nhu D. Le , James V. Zidek

Better subset regression

To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the…

Methodology · Statistics 2013-03-20 Shifeng Xiong

Distributionally Robust Feature Selection

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…

Machine Learning · Computer Science 2025-10-27 Maitreyi Swaroop , Tamar Krishnamurti , Bryan Wilder

Nearly Optimal Subdata Selection

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…

Methodology · Statistics 2026-04-28 Min Yang , Wei Zheng , John Stufken , Ming-Chung Chang , Ting Tian , Xueqin Wang

A Statistical View of Column Subset Selection

We consider the problem of selecting a small subset of representative variables from a large dataset. In the computer science literature, this dimensionality reduction problem is typically formalized as Column Subset Selection (CSS).…

Methodology · Statistics 2025-05-20 Anav Sood , Trevor Hastie