English
Related papers

Related papers: A MOM-based ensemble method for robustness, subsam…

200 papers

Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…

Methodology · Statistics 2025-09-18 Anthony-Alexander Christidis , Gabriela Cohen-Freue

We introduce new estimators for robust machine learning based on median-of-means (MOM) estimators of the mean of real valued random variables. These estimators achieve optimal rates of convergence under minimal assumptions on the dataset.…

Statistics Theory · Mathematics 2017-12-04 Guillaume Lecué , Matthieu Lerasle

Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in…

Machine Learning · Computer Science 2022-10-20 Xueying Ding , Lingxiao Zhao , Leman Akoglu

Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This…

Machine Learning · Computer Science 2019-05-22 Chengrun Yang , Yuji Akimoto , Dae Won Kim , Madeleine Udell

The goal of compressed sensing is to estimate a high dimensional vector from an underdetermined system of noisy linear equations. In analogy to classical compressed sensing, here we assume a generative model as a prior, that is, we assume…

Machine Learning · Statistics 2021-06-24 Ajil Jalal , Liu Liu , Alexandros G. Dimakis , Constantine Caramanis

Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training…

The increased computerization in recent years has resulted in the production of a variety of different software, however measures need to be taken to ensure that the produced software isn't defective. Many researchers have worked in this…

Software Engineering · Computer Science 2023-04-06 Param Khakhar and , Rahul Kumar Dubey

Linear mixed models (LMMs) are a popular class of methods for analyzing longitudinal and clustered data. However, such models can be sensitive to outliers, and this can lead to biased inference on model parameters and inaccurate prediction…

Methodology · Statistics 2025-03-28 Shonosuke Sugasawa , Francis K. C. Hui , Alan H. Welsh

We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight,…

Modern machine learning-based recognition approaches require large-scale datasets with large number of labelled training images. However, such datasets are inherently difficult and costly to collect and annotate. Hence there is a great and…

Machine Learning · Computer Science 2019-12-30 Yanwei Fu , De-An Huang , Leonid Sigal

In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic…

Computational Geometry · Computer Science 2020-02-28 Hu Ding , Zixiu Wang

Subsampling is an important technique to tackle the computational challenges brought by big data. Many subsampling procedures fall within the framework of importance sampling, which assigns high sampling probabilities to the samples…

Machine Learning · Statistics 2022-03-07 Tieliang Gong , Yuxin Dong , Hong Chen , Bo Dong , Chen Li

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which…

Methodology · Statistics 2025-10-21 Luca Insolia , Ana Kenney , Francesca Chiaromonte , Giovanni Felici

In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $\theta$ of a square integrable r.v. $Z$, around which accurate nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a…

Machine Learning · Statistics 2021-02-09 Pierre Laforgue , Guillaume Staerman , Stephan Clémençon

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield…

Machine Learning · Statistics 2020-02-14 Martin Binder , Julia Moosbauer , Janek Thomas , Bernd Bischl

Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when…

One of the most tedious tasks in the application of machine learning is model selection, i.e. hyperparameter selection. Fortunately, recent progress has been made in the automation of this process, through the use of sequential model-based…

Machine Learning · Computer Science 2014-02-05 Alexandre Lacoste , Hugo Larochelle , François Laviolette , Mario Marchand

Generalized Linear Models are routinely used in data analysis. The classical procedures for estimation are based on Maximum Likelihood and it is well known that the presence of outliers can have a large impact on this estimator. Robust…

Computation · Statistics 2017-10-02 Marina Valdora , Claudio Agostinelli , Victor J. Yohai

Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…

Machine Learning · Statistics 2018-06-26 Michael Minyi Zhang , Henry Lam , Lizhen Lin

The two primary approaches for high-dimensional regression problems are sparse methods (e.g., best subset selection, which uses the L0-norm in the penalty) and ensemble methods (e.g., random forests). Although sparse methods typically yield…

Methodology · Statistics 2024-10-31 Anthony-Alexander Christidis , Stefan Van Aelst , Ruben Zamar
‹ Prev 1 2 3 10 Next ›