English
Related papers

Related papers: Data Summarization via Bilevel Optimization

200 papers

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple…

Machine Learning · Computer Science 2020-10-23 Zalán Borsos , Mojmír Mutný , Andreas Krause

The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset. Existing methods achieved promising results in…

Machine Learning · Computer Science 2023-01-25 Xiao Zhou , Renjie Pi , Weizhong Zhang , Yong Lin , Tong Zhang

We investigate coresets - succinct, small summaries of large data sets - so that solutions found on the summary are provably competitive with solution found on the full data set. We provide an overview over the state-of-the-art in coreset…

Machine Learning · Statistics 2017-06-06 Olivier Bachem , Mario Lucic , Andreas Krause

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before,…

Machine Learning · Computer Science 2026-01-30 Brian B. Moser , Arundhati S. Shanbhag , Stanislav Frolov , Federico Raue , Joachim Folz , Andreas Dengel

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as…

Machine Learning · Computer Science 2020-02-21 Pietro Barbiero , Giovanni Squillero , Alberto Tonda

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the…

Machine Learning · Statistics 2021-03-01 Jacky Y. Zhang , Rajiv Khanna , Anastasios Kyrillidis , Oluwasanmi Koyejo

Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between…

Machine Learning · Computer Science 2022-04-15 Hanlin Lu , Changchang Liu , Shiqiang Wang , Ting He , Vijay Narayanan , Kevin S. Chan , Stephen Pasteris

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational…

Machine Learning · Computer Science 2022-10-11 Jiawei Huang , Ruomin Huang , Wenjie Liu , Nikolaos M. Freris , Hu Ding

Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically…

Machine Learning · Computer Science 2024-03-01 Xiaobo Xia , Jiale Liu , Shaokun Zhang , Qingyun Wu , Hongxin Wei , Tongliang Liu

In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic…

Computational Geometry · Computer Science 2020-02-28 Hu Ding , Zixiu Wang

Deep Learning models have transformed various domains, including the healthcare sector, particularly biomedical image classification by learning intricate features and enabling accurate diagnostics pertaining to complex diseases. Recent…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Imran Ashraf , Mukhtar Ullah , Muhammad Faisal Nadeem , Muhammad Nouman Noor

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable…

Machine Learning · Computer Science 2024-06-03 Mohammad Jafari , Yimeng Zhang , Yihua Zhang , Sijia Liu

Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that…

Machine Learning · Statistics 2014-10-14 Talayeh Razzaghi , Ilya Safro

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive…

Machine Learning · Statistics 2018-06-08 Olivier Bachem , Mario Lucic , Andreas Krause

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably…

Machine Learning · Statistics 2018-02-21 Olivier Bachem , Mario Lucic , Silvio Lattanzi

Coresets have emerged as a powerful tool to summarize data by selecting a small subset of the original observations while retaining most of its information. This approach has led to significant computational speedups but the performance of…

Statistics Theory · Mathematics 2020-12-10 Paxton Turner , Jingbo Liu , Philippe Rigollet

A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques…

Machine Learning · Computer Science 2022-10-11 Jiaxiang Chen , Qingyuan Yang , Ruomin Huang , Hu Ding

Bilevel learning refers to machine learning problems that can be formulated as bilevel optimization models, where decisions are organized in a hierarchical structure. This paradigm has recently gained considerable attention in machine…

Optimization and Control · Mathematics 2026-05-05 Riccardo Grazzi , Massimiliano Pontil , Saverio Salzo , Alain Zemkoho
‹ Prev 1 2 3 10 Next ›