Related papers: A MOM-based ensemble method for robustness, subsam…

Robust Multi-Model Subset Selection

Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…

Methodology · Statistics 2025-09-18 Anthony-Alexander Christidis , Gabriela Cohen-Freue

Robust machine learning by median-of-means : theory and practice

We introduce new estimators for robust machine learning based on median-of-means (MOM) estimators of the mean of real valued random variables. These estimators achieve optimal rates of convergence under minimal assumptions on the dataset.…

Statistics Theory · Mathematics 2017-12-04 Guillaume Lecué , Matthieu Lerasle

Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution

Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in…

Machine Learning · Computer Science 2022-10-20 Xueying Ding , Lingxiao Zhao , Leman Akoglu

OBOE: Collaborative Filtering for AutoML Model Selection

Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This…

Machine Learning · Computer Science 2019-05-22 Chengrun Yang , Yuji Akimoto , Dae Won Kim , Madeleine Udell

Robust Compressed Sensing using Generative Models

The goal of compressed sensing is to estimate a high dimensional vector from an underdetermined system of noisy linear equations. In analogy to classical compressed sensing, here we assume a generative model as a prior, that is, we assume…

Machine Learning · Statistics 2021-06-24 Ajil Jalal , Liu Liu , Alexandros G. Dimakis , Constantine Caramanis

MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning

Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training…

Machine Learning · Computer Science 2023-06-21 Krishnateja Killamsetty , Alexandre V. Evfimievski , Tejaswini Pedapati , Kiran Kate , Lucian Popa , Rishabh Iyer

The Integrity of Machine Learning Algorithms against Software Defect Prediction

The increased computerization in recent years has resulted in the production of a variety of different software, however measures need to be taken to ensure that the produced software isn't defective. Many researchers have worked in this…

Software Engineering · Computer Science 2023-04-06 Param Khakhar and , Rahul Kumar Dubey

Robust Linear Mixed Models using Hierarchical Gamma-Divergence

Linear mixed models (LMMs) are a popular class of methods for analyzing longitudinal and clustered data. However, such models can be sensitive to outliers, and this can lead to biased inference on model parameters and inaccurate prediction…

Methodology · Statistics 2025-03-28 Shonosuke Sugasawa , Francis K. C. Hui , Alan H. Welsh

Robust Mixture Learning when Outliers Overwhelm Small Groups

We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight,…

Machine Learning · Computer Science 2024-07-23 Daniil Dmitriev , Rares-Darius Buhai , Stefan Tiegel , Alexander Wolters , Gleb Novikov , Amartya Sanyal , David Steurer , Fanny Yang

Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis

Modern machine learning-based recognition approaches require large-scale datasets with large number of labelled training images. However, such datasets are inherently difficult and costly to collect and annotate. Hence there is a great and…

Machine Learning · Computer Science 2019-12-30 Yanwei Fu , De-An Huang , Leonid Sigal

Layered Sampling for Robust Optimization Problems

In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic…

Computational Geometry · Computer Science 2020-02-28 Hu Ding , Zixiu Wang

Markov subsampling based Huber Criterion

Subsampling is an important technique to tackle the computational challenges brought by big data. Many subsampling procedures fall within the framework of importance sampling, which assigns high sampling probabilities to the samples…

Machine Learning · Statistics 2022-03-07 Tieliang Gong , Yuxin Dong , Hong Chen , Bo Dong , Chen Li

Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which…

Methodology · Statistics 2025-10-21 Luca Insolia , Ana Kenney , Francesca Chiaromonte , Giovanni Felici

Generalization Bounds in the Presence of Outliers: a Median-of-Means Study

In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $\theta$ of a square integrable r.v. $Z$, around which accurate nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a…

Machine Learning · Statistics 2021-02-09 Pierre Laforgue , Guillaume Staerman , Stephan Clémençon

Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield…

Machine Learning · Statistics 2020-02-14 Martin Binder , Julia Moosbauer , Janek Thomas , Bernd Bischl

Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when…

Machine Learning · Computer Science 2024-06-07 Florian Karl , Tobias Pielok , Julia Moosbauer , Florian Pfisterer , Stefan Coors , Martin Binder , Lennart Schneider , Janek Thomas , Jakob Richter , Michel Lang , Eduardo C. Garrido-Merchán , Juergen Branke , Bernd Bischl

Sequential Model-Based Ensemble Optimization

One of the most tedious tasks in the application of machine learning is model selection, i.e. hyperparameter selection. Fortunately, recent progress has been made in the automation of this process, through the use of sequential model-based…

Machine Learning · Computer Science 2014-02-05 Alexandre Lacoste , Hugo Larochelle , François Laviolette , Mario Marchand

Robust Estimation in High Dimensional Generalized Linear Models

Generalized Linear Models are routinely used in data analysis. The classical procedures for estimation are based on Maximum Likelihood and it is well known that the presence of outliers can have a large impact on this estimator. Robust…

Computation · Statistics 2017-10-02 Marina Valdora , Claudio Agostinelli , Victor J. Yohai

Robust and Parallel Bayesian Model Selection

Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…

Machine Learning · Statistics 2018-06-26 Michael Minyi Zhang , Henry Lam , Lizhen Lin

Multi-Model Subset Selection

The two primary approaches for high-dimensional regression problems are sparse methods (e.g., best subset selection, which uses the L0-norm in the penalty) and ensemble methods (e.g., random forests). Although sparse methods typically yield…

Methodology · Statistics 2024-10-31 Anthony-Alexander Christidis , Stefan Van Aelst , Ruben Zamar