Related papers: Probabilistic Bilevel Coreset Selection

Data Summarization via Bilevel Optimization

The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a…

Machine Learning · Computer Science 2021-09-28 Zalán Borsos , Mojmír Mutný , Marco Tagliasacchi , Andreas Krause

Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as…

Machine Learning · Computer Science 2020-02-21 Pietro Barbiero , Giovanni Squillero , Alberto Tonda

Coresets via Bilevel Optimization for Continual Learning and Streaming

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple…

Machine Learning · Computer Science 2020-10-23 Zalán Borsos , Mojmír Mutný , Andreas Krause

A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances

Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning. Although several surveys have examined data reduction strategies before,…

Machine Learning · Computer Science 2026-01-30 Brian B. Moser , Arundhati S. Shanbhag , Stanislav Frolov , Federico Raue , Joachim Folz , Andreas Dengel

Coreset selection based on Intra-class diversity

Deep Learning models have transformed various domains, including the healthcare sector, particularly biomedical image classification by learning intricate features and enabling accurate diagnostics pertaining to complex diseases. Recent…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Imran Ashraf , Mukhtar Ullah , Muhammad Faisal Nadeem , Muhammad Nouman Noor

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning

Coreset selection, which aims to select a subset of the most informative training samples, is a long-standing learning problem that can benefit many downstream tasks such as data-efficient learning, continual learning, neural architecture…

Machine Learning · Computer Science 2022-06-30 Chengcheng Guo , Bo Zhao , Yanbing Bai

Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints

Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically…

Machine Learning · Computer Science 2024-03-01 Xiaobo Xia , Jiale Liu , Shaokun Zhang , Qingyun Wu , Hongxin Wei , Tongliang Liu

Bilevel learning

Bilevel learning refers to machine learning problems that can be formulated as bilevel optimization models, where decisions are organized in a hierarchical structure. This paradigm has recently gained considerable attention in machine…

Optimization and Control · Mathematics 2026-05-05 Riccardo Grazzi , Massimiliano Pontil , Saverio Salzo , Alain Zemkoho

Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the…

Machine Learning · Statistics 2021-03-01 Jacky Y. Zhang , Rajiv Khanna , Anastasios Kyrillidis , Oluwasanmi Koyejo

Bilevel Continual Learning

Continual learning (CL) studies the problem of learning a sequence of tasks, one at a time, such that the learning of each new task does not lead to the deterioration in performance on the previously seen ones while exploiting previously…

Machine Learning · Computer Science 2020-11-03 Ammar Shaker , Francesco Alesiani , Shujian Yu , Wenzhe Yin

A Challenge in Reweighting Data with Bilevel Optimization

In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution,…

Machine Learning · Statistics 2023-10-27 Anastasia Ivanova , Pierre Ablin

Robust Coreset Construction for Distributed Machine Learning

Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original…

Machine Learning · Computer Science 2020-06-24 Hanlin Lu , Ming-Ju Li , Ting He , Shiqiang Wang , Vijaykrishnan Narayanan , Kevin S Chan

Deep Bilevel Learning

We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation…

Computer Vision and Pattern Recognition · Computer Science 2018-09-06 Simon Jenni , Paolo Favaro

Gradient-matching coresets for continual learning

We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of…

Machine Learning · Computer Science 2021-12-10 Lukas Balles , Giovanni Zappella , Cédric Archambeau

The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable…

Machine Learning · Computer Science 2024-06-03 Mohammad Jafari , Yimeng Zhang , Yihua Zhang , Sijia Liu

A Bilevel Optimization Framework for Imbalanced Data Classification

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…

Machine Learning · Computer Science 2025-07-11 Karen Medlin , Sven Leyffer , Krishnan Raghavan

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

The Impact of Coreset Selection on Spurious Correlations and Group Robustness

Coreset selection methods have shown promise in reducing the training data size while maintaining model performance for data-efficient machine learning. However, as many datasets suffer from biases that cause models to learn spurious…

Machine Learning · Computer Science 2025-10-22 Amaya Dharmasiri , William Yang , Polina Kirichenko , Lydia Liu , Olga Russakovsky

Coreset Selection via LLM-based Concept Bottlenecks

Coreset Selection (CS) aims to identify a subset of the training dataset that achieves model performance comparable to using the entire dataset. Many state-of-the-art CS methods select coresets using scores whose computation requires…

Machine Learning · Computer Science 2025-06-05 Akshay Mehra , Trisha Mittal , Subhadra Gopalakrishnan , Joshua Kimball

Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes

As deep learning models continue to scale, the growing computational demands have amplified the need for effective coreset selection techniques. Coreset selection aims to accelerate training by identifying small, representative subsets of…

Machine Learning · Computer Science 2025-11-24 Wei-Kai Chang , Rajiv Khanna