Related papers: Selection via Proxy: Efficient Data Selection for …

ASP: Automatic Selection of Proxy dataset for efficient AutoML

Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the…

Machine Learning · Computer Science 2023-10-19 Peng Yao , Chao Liao , Jiyuan Jia , Jianchao Tan , Bin Chen , Chengru Song , Di Zhang

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models

Fine-tuning the pre-trained model with active learning holds promise for reducing annotation costs. However, this combination introduces significant computational costs, particularly with the growing scale of pre-trained models. Recent…

Machine Learning · Computer Science 2024-11-19 Ziting Wen , Oscar Pizarro , Stefan Williams

The Power of Proxy Data and Proxy Networks for Hyper-Parameter Optimization in Medical Image Segmentation

Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple…

Image and Video Processing · Electrical Eng. & Systems 2021-07-13 Vishwesh Nath , Dong Yang , Ali Hatamizadeh , Anas A. Abidin , Andriy Myronenko , Holger Roth , Daguang Xu

Data Efficient Contrastive Learning in Histopathology using Active Sampling

Deep learning (DL) based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Tahsin Reasat , Asif Sushmit , David S. Smith

Using Small Proxy Datasets to Accelerate Hyperparameter Search

One of the biggest bottlenecks in a machine learning workflow is waiting for models to train. Depending on the available computing resources, it can take days to weeks to train a neural network on a large dataset with many classes such as…

Machine Learning · Computer Science 2019-06-13 Sam Shleifer , Eric Prokop

Dataset Pruning: Reducing Training Data by Examining Generalization Influence

The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's…

Machine Learning · Computer Science 2023-02-28 Shuo Yang , Zeke Xie , Hanyu Peng , Min Xu , Mingming Sun , Ping Li

Smart Cuts: Enhance Active Learning for Vulnerability Detection by Pruning Hard-to-Learn Data

Vulnerability detection is crucial for identifying security weaknesses in software systems. However, training effective machine learning models for this task is often constrained by the high cost and expertise required for data annotation.…

Cryptography and Security · Computer Science 2025-08-19 Xiang Lan , Tim Menzies , Bowen Xu

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these…

Artificial Intelligence · Computer Science 2024-10-17 Talfan Evans , Shreya Pathak , Hamza Merzic , Jonathan Schwarz , Ryutaro Tanno , Olivier J. Henaff

SVP-CF: Selection via Proxy for Collaborative Filtering Data

We study the practical consequences of dataset sampling strategies on the performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or…

Information Retrieval · Computer Science 2021-07-13 Noveen Sachdeva , Carole-Jean Wu , Julian McAuley

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning

Coreset selection, which aims to select a subset of the most informative training samples, is a long-standing learning problem that can benefit many downstream tasks such as data-efficient learning, continual learning, neural architecture…

Machine Learning · Computer Science 2022-06-30 Chengcheng Guo , Bo Zhao , Yanbing Bai

Practical Active Learning with Model Selection for Small Data

Active learning is of great interest for many practical applications, especially in industry and the physical sciences, where there is a strong need to minimize the number of costly experiments necessary to train predictive models. However,…

Machine Learning · Computer Science 2021-12-23 Maryam Pardakhti , Nila Mandal , Anson W. K. Ma , Qian Yang

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

On Coresets for Support Vector Machines

We present an efficient coreset construction algorithm for large-scale Support Vector Machine (SVM) training in Big Data and streaming applications. A coreset is a small, representative subset of the original data points such that a models…

Machine Learning · Computer Science 2020-02-18 Murad Tukan , Cenk Baykal , Dan Feldman , Daniela Rus

Finding High-Value Training Data Subset through Differentiable Convex Programming

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the "value" of individual training datapoints have been proposed…

Machine Learning · Computer Science 2021-04-29 Soumi Das , Arshdeep Singh , Saptarshi Chatterjee , Suparna Bhattacharya , Sourangshu Bhattacharya

Data-Efficient Challenges in Visual Inductive Priors: A Retrospective

Deep Learning requires large amounts of data to train models that work well. In data-deficient settings, performance can be degraded. We investigate which Deep Learning methods benefit training models in a data-deficient setting, by…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Robert-Jan Bruintjes , Attila Lengyel , Osman Semih Kayhan , Davide Zambrano , Nergis Tömen , Hadi Jamali-Rad , Jan van Gemert

Advancing Deep Active Learning & Data Subset Selection: Unifying Principles with Information-Theory Intuitions

At its core, this thesis aims to enhance the practicality of deep learning by improving the label and training efficiency of deep learning models. To this end, we investigate data subset selection techniques, specifically active learning…

Machine Learning · Computer Science 2024-03-11 Andreas Kirsch

Active Learning for Convolutional Neural Networks: A Core-Set Approach

Convolutional neural networks (CNNs) have been successfully applied to many recognition and learning tasks using a universal recipe; training a deep model on a very large dataset of supervised examples. However, this approach is rather…

Machine Learning · Statistics 2018-06-04 Ozan Sener , Silvio Savarese

Vision Backbone Efficient Selection for Image Classification in Low-Data Regimes

Transfer learning has become an essential tool in modern computer vision, allowing practitioners to leverage backbones, pretrained on large datasets, to train successful models from limited annotated data. Choosing the right backbone is…

Computer Vision and Pattern Recognition · Computer Science 2025-08-20 Joris Guerin , Shray Bansal , Amirreza Shaban , Paulo Mann , Harshvardhan Gazula

Informative Sample-Aware Proxy for Deep Metric Learning

Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies. Proxies, which are class-representative points in an embedding space, receive updates based on proxy-sample similarities…

Computer Vision and Pattern Recognition · Computer Science 2022-11-21 Aoyu Li , Ikuro Sato , Kohta Ishikawa , Rei Kawakami , Rio Yokota

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Vishal Kaushal , Rishabh Iyer , Suraj Kothawade , Rohan Mahadev , Khoshrav Doctor , Ganesh Ramakrishnan