Related papers: Subsampling Winner Algorithm for Feature Selection…

A Feature Selection Method that Controls the False Discovery Rate

The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection…

Methodology · Statistics 2023-11-10 Mehdi Rostami , Olli Saarela

FDR controlling procedures with dimension reduction and their application to GWAS with linkage disequilibrium score

Genome-wide association studies (GWAS) have led to the discovery of numerous single nucleotide polymorphisms (SNPs) associated with various phenotypes and complex diseases. However, the identified genetic variants do not fully explain the…

Methodology · Statistics 2025-07-09 Dayeon Jung , Yewon Kim , Junyong Park

Stochastic Weight Averaging Revisited

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective,…

Machine Learning · Computer Science 2022-09-20 Hao Guo , Jiyong Jin , Bin Liu

Feature Selection Using Classifier in High Dimensional Data

Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The…

Computer Vision and Pattern Recognition · Computer Science 2014-01-07 Vijendra Singh , Shivani Pathak

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Controlling the False Discovery Rate for Binary Feature Selection via Knockoff

Variable selection has been widely used in data analysis for the past decades, and it becomes increasingly important in the Big Data era as there are usually hundreds of variables available in a dataset. To enhance interpretability of a…

Methodology · Statistics 2020-08-17 Yuxiang Xie , Kwun Chuen Gary Chan

Statistical Inference for Sequential Feature Selection after Domain Adaptation

In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring…

Machine Learning · Statistics 2025-01-20 Duong Tan Loc , Nguyen Thang Loi , Vo Nguyen Le Duy

Feature-Weighted Maximum Representative Subsampling

In the social sciences, it is often necessary to debias studies and surveys before valid conclusions can be drawn. Debiasing algorithms enable the computational removal of bias using sample weights. However, an issue arises when only a…

Machine Learning · Computer Science 2026-03-03 Tony Hauptmann , Stefan Kramer

Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. To tackle the challenge, we develop randomized algorithms for two…

Computation · Statistics 2022-04-11 Shiyuan He , Xiaomeng Yan

Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary…

Machine Learning · Statistics 2016-03-08 Vural Aksakalli , Milad Malekipirbazari

Statistical Inference for Feature Selection after Optimal Transport-based Domain Adaptation

Feature Selection (FS) under domain adaptation (DA) is a critical task in machine learning, especially when dealing with limited target data. However, existing methods lack the capability to guarantee the reliability of FS under DA. In this…

Machine Learning · Statistics 2024-10-22 Nguyen Thang Loi , Duong Tan Loc , Vo Nguyen Le Duy

High-dimensional Statistical Inference and Variable Selection Using Sufficient Dimension Association

Simultaneous variable selection and statistical inference is challenging in high-dimensional data analysis. Most existing post-selection inference methods require explicitly specified regression models, which are often linear, as well as…

Methodology · Statistics 2026-03-19 Shangyuan Ye , Shauna Rakshe , Ye Liang

Sample Weight Averaging for Stable Prediction

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting…

Machine Learning · Computer Science 2025-02-12 Han Yu , Yue He , Renzhe Xu , Dongbai Li , Jiayin Zhang , Wenchao Zou , Peng Cui

Feature Adaptation for Sparse Linear Regression

Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small…

Data Structures and Algorithms · Computer Science 2023-05-29 Jonathan Kelner , Frederic Koehler , Raghu Meka , Dhruv Rohatgi

SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

This manuscript presents the following: (1) an improved version of the Binary Simultaneous Perturbation Stochastic Approximation (SPSA) Method for feature selection in machine learning (Aksakalli and Malekipirbazari, Pattern Recognition…

Machine Learning · Statistics 2018-04-17 Zeren D. Yenice , Niranjan Adhikari , Yong Kai Wong , Vural Aksakalli , Alev Taskin Gumus , Babak Abbasi

Subbagging Variable Selection for Big Data

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a…

Artificial Intelligence · Computer Science 2015-03-17 L. A. Belanche , F. F. González

SeWA: Selective Weight Average via Probabilistic Masking

Weight averaging has become a standard technique for enhancing model performance. However, methods such as Stochastic Weight Averaging (SWA) and Latest Weight Averaging (LAWA) often require manually designed procedures to sample from the…

Machine Learning · Computer Science 2025-02-17 Peng Wang , Shengchao Hu , Zerui Tao , Guoxia Wang , Dianhai Yu , Li Shen , Quan Zheng , Dacheng Tao

Performance analysis of unsupervised feature selection methods

Feature selection (FS) is a process which attempts to select more informative features. In some cases, too many redundant or irrelevant features may overpower main features for classification. Feature selection can remedy this problem and…

Machine Learning · Computer Science 2013-06-07 A. Nisthana Parveen , H. Hannah Inbarani , E. N. Sathishkumar

Efficient subsampling for high-dimensional data

In the field of big data analytics, the search for efficient subdata selection methods that enable robust statistical inferences with minimal computational resources is of high importance. A procedure prior to subdata selection could…

Methodology · Statistics 2024-11-12 Vasilis Chasiotis , Lin Wang , Dimitris Karlis