Related papers: Layered Sampling for Robust Optimization Problems

Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems

In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers…

Machine Learning · Computer Science 2022-01-24 Zixiu Wang , Yiwen Guo , Hu Ding

Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

In this paper, we study the problem of {\em $k$-center clustering with outliers}. The problem has many important applications in real world, but the presence of outliers can significantly increase the computational complexity. Though a…

Machine Learning · Computer Science 2023-01-10 Hu Ding , Ruomin Huang , Kai Liu , Haikuo Yu , Zixiu Wang

A Practical Algorithm for Distributed Clustering and Outlier Detection

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-12 Jiecao Chen , Erfan Sadeqi Azer , Qin Zhang

Outlier-Robust Convex Segmentation

We derive a convex optimization problem for the task of segmenting sequential data, which explicitly treats presence of outliers. We describe two algorithms for solving this problem, one exact and one a top-down novel approach, and we…

Machine Learning · Computer Science 2014-11-19 Itamar Katz , Koby Crammer

The Effectiveness of Uniform Sampling for Center-Based Clustering with Outliers

Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…

Data Structures and Algorithms · Computer Science 2020-05-04 Hu Ding , Jiawei Huang , Haikuo Yu

Robust Clustering Using Outlier-Sparsity Regularization

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…

Machine Learning · Statistics 2015-05-27 Pedro A. Forero , Vassilis Kekatos , Georgios B. Giannakis

Robust Coreset Construction for Distributed Machine Learning

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

A Novel Sequential Coreset Method for Gradient Descent Algorithms

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational…

Machine Learning · Computer Science 2022-10-11 Jiawei Huang , Ruomin Huang , Wenjie Liu , Nikolaos M. Freris , Hu Ding

The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable…

Machine Learning · Computer Science 2024-06-03 Mohammad Jafari , Yimeng Zhang , Yihua Zhang , Sijia Liu

Outlier detection in regression: conic quadratic formulations

In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic…

Optimization and Control · Mathematics 2023-07-13 Andrés Gómez , José Neto

Outlier absorbing based on a Bayesian approach

The presence of outliers is prevalent in machine learning applications and may produce misleading results. In this paper a new method for dealing with outliers and anomal samples is proposed. To overcome the outlier issue, the proposed…

Machine Learning · Computer Science 2016-07-05 Parsa Bagherzadeh , Hadi Sadoghi Yazdi

A Novel Geometric Approach for Outlier Recognition in High Dimension

Outlier recognition is a fundamental problem in data analysis and has attracted a great deal of attention in the past decades. However, most existing methods still suffer from several issues such as high time and space complexities or…

Computational Geometry · Computer Science 2019-04-09 Hu Ding , Mingquan Ye

Greedy Strategy Works for $k$-Center Clustering with Outliers and Coreset Construction

We study the problem of $k$-center clustering with outliers in arbitrary metrics and Euclidean space. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm…

Computational Geometry · Computer Science 2019-04-30 Hu Ding , Haikuo Yu , Zixiu Wang

Practical Bayesian optimization in the presence of outliers

Inference in the presence of outliers is an important field of research as outliers are ubiquitous and may arise across a variety of problems and domains. Bayesian optimization is method that heavily relies on probabilistic inference. This…

Machine Learning · Computer Science 2017-12-14 Ruben Martinez-Cantin , Kevin Tee , Michael McCourt

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Outliers are the points which are different from or inconsistent with the rest of the data. They can be novel, new, abnormal, unusual or noisy information. Outliers are sometimes more interesting than the majority of the data. The main…

Computer Vision and Pattern Recognition · Computer Science 2014-06-20 Singh Vijendra , Pathak Shivani

A Coreset Learning Reality Check

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information…

Machine Learning · Computer Science 2023-01-18 Fred Lu , Edward Raff , James Holt

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes

As deep learning models continue to scale, the growing computational demands have amplified the need for effective coreset selection techniques. Coreset selection aims to accelerate training by identifying small, representative subsets of…

Machine Learning · Computer Science 2025-11-24 Wei-Kai Chang , Rajiv Khanna

Data Summarization via Bilevel Optimization

The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a…

Machine Learning · Computer Science 2021-09-28 Zalán Borsos , Mojmír Mutný , Marco Tagliasacchi , Andreas Krause

Near-optimal Coresets for Robust Clustering

We consider robust clustering problems in $\mathbb{R}^d$, specifically $k$-clustering problems (e.g., $k$-Median and $k$-Means with $m$ outliers, where the cost for a given center set $C \subset \mathbb{R}^d$ aggregates the distances from…

Data Structures and Algorithms · Computer Science 2022-10-20 Lingxiao Huang , Shaofeng H. -C. Jiang , Jianing Lou , Xuan Wu