Related papers: Data-Efficient Training by Evolved Sampling

Importance Weighted Evolution Strategies

Evolution Strategies (ES) emerged as a scalable alternative to popular Reinforcement Learning (RL) techniques, providing an almost perfect speedup when distributed across hundreds of CPU cores thanks to a reduced communication overhead.…

Machine Learning · Statistics 2018-11-13 Víctor Campos , Xavier Giro-i-Nieto , Jordi Torres

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Dynamic Data selection aims to accelerate training by prioritizing informative samples during online training. However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate…

Machine Learning · Computer Science 2026-05-14 Suorong Yang , Fangjian Su , Hai Gan , Ziqi Ye , Jie Li , Baile Xu , Furao Shen , Soujanya Poria

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Enhancing the instruction-following ability of Large Language Models (LLMs) primarily demands substantial instruction-tuning datasets. However, the sheer volume of these imposes a considerable computational burden and annotation cost. To…

Computation and Language · Computer Science 2023-11-15 Shengguang Wu , Keming Lu , Benfeng Xu , Junyang Lin , Qi Su , Chang Zhou

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in…

Machine Learning · Computer Science 2024-10-10 Jiawei Yao , Chuming Li , Canran Xiao

EvoSelect: Data-Efficient LLM Evolution for Targeted Task Adaptation

Adapting large language models (LLMs) to a targeted task efficiently and effectively remains a fundamental challenge. Such adaptation often requires iteratively improving the model toward a targeted task, yet collecting high-quality…

Computation and Language · Computer Science 2026-04-30 Ting-Wei Li , Sirui Chen , Jiaru Zou , Yingbing Huang , Tianxin Wei , Jingrui He , Hanghang Tong

A Data-Driven Modeling Framework of Time-Dependent Switched Dynamical Systems via Extreme Learning Machine

In this work, a data-driven modeling framework of switched dynamical systems under time-dependent switching is proposed. The learning technique utilized to model system dynamics is Extreme Learning Machine (ELM). First, a method is…

Systems and Control · Electrical Eng. & Systems 2021-01-27 Weiming Xiang

Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection

Recent advancements in large language models (LLMs) have significantly improved code generation and program comprehension, accelerating the evolution of software engineering. Current methods primarily enhance model performance by leveraging…

Computation and Language · Computer Science 2025-07-04 Weijie Lyu , Sheng-Jun Huang , Xuan Xia

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

We propose EVOlutionary Selector (EVOS), an efficient training paradigm for accelerating Implicit Neural Representation (INR). Unlike conventional INR training that feeds all samples through the neural network in each iteration, our…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Weixiang Zhang , Shuzhao Xie , Chengwei Ren , Siyi Xie , Chen Tang , Shijia Ge , Mingzi Wang , Zhi Wang

When Dynamic Data Selection Meets Data Augmentation

Dynamic data selection aims to accelerate training with lossless performance. However, reducing training data inherently limits data diversity, potentially hindering generalization. While data augmentation is widely used to enhance…

Machine Learning · Computer Science 2025-05-13 Suorong Yang , Peng Ye , Furao Shen , Dongzhan Zhou

Selective Embedding for Deep Learning

Deep learning has revolutionized many industries by enabling models to automatically learn complex patterns from raw data, reducing dependence on manual feature engineering. However, deep learning algorithms are sensitive to input data, and…

Machine Learning · Computer Science 2025-07-21 Mert Sehri , Zehui Hua , Francisco de Assis Boldt , Patrick Dumond

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining

Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency. We introduce Efficient Selective Language Modeling (ESLM), a risk-aware algorithm that improves training…

Machine Learning · Computer Science 2025-05-27 Melis Ilayda Bal , Volkan Cevher , Michael Muehlebach

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent…

Machine Learning · Computer Science 2025-03-04 Tianchi Xie , Jiangning Zhu , Guozu Ma , Minzhi Lin , Wei Chen , Weikai Yang , Shixia Liu

Efficient Deep Representation Learning by Adaptive Latent Space Sampling

Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain.…

Computer Vision and Pattern Recognition · Computer Science 2020-04-14 Yuanhan Mo , Shuo Wang , Chengliang Dai , Rui Zhou , Zhongzhao Teng , Wenjia Bai , Yike Guo

Reduced Electron Exposure for Energy-Dispersive Spectroscopy using Dynamic Sampling

Analytical electron microscopy and spectroscopy of biological specimens, polymers, and other beam sensitive materials has been a challenging area due to irradiation damage. There is a pressing need to develop novel imaging and spectroscopic…

Machine Learning · Computer Science 2017-07-14 Yan Zhang , G. M. Dilshan Godaliyadda , Nicola Ferrier , Emine B. Gulsoy , Charles A. Bouman , Charudatta Phatak

SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling

In this paper, we present SwiftLearn, a data-efficient approach to accelerate training of deep learning models using a subset of data samples selected during the warm-up stages of training. This subset is selected based on an importance…

Machine Learning · Computer Science 2023-11-28 Habib Hajimolahoseini , Omar Mohamed Awad , Walid Ahmed , Austin Wen , Saina Asani , Mohammad Hassanpour , Farnoosh Javadi , Mehdi Ahmadi , Foozhan Ataiefard , Kangling Liu , Yang Liu

Adaptive Data Dropout: Towards Self-Regulated Learning in Deep Neural Networks

Deep neural networks are typically trained by uniformly sampling large datasets across epochs, despite evidence that not all samples contribute equally throughout learning. Recent work shows that progressively reducing the amount of…

Machine Learning · Computer Science 2026-04-15 Amar Gahir , Varshil Patel , Shreyank N Gowda

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e.g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.…

Computation and Language · Computer Science 2023-08-07 Chenglong Wang , Hang Zhou , Yimin Hu , Yifu Huo , Bei Li , Tongran Liu , Tong Xiao , Jingbo Zhu

Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference

Vision-language models (VLMs) have recently expanded from static image understanding to video reasoning, but their scalability is fundamentally limited by the quadratic cost of processing dense frame sequences. Long videos often exceed the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Natan Bagrov , Eugene Khvedchenia , Borys Tymchenko , Shay Aharon , Lior Kadoch , Tomer Keren , Ofri Masad , Yonatan Geifman , Ran Zilberstein , Tuomas Rintamaki , Matthieu Le , Andrew Tao

Active Learning with Expected Error Reduction

Active learning has been studied extensively as a method for efficient data collection. Among the many approaches in literature, Expected Error Reduction (EER) (Roy and McCallum) has been shown to be an effective method for active learning:…

Machine Learning · Computer Science 2022-11-18 Stephen Mussmann , Julia Reisler , Daniel Tsai , Ehsan Mousavi , Shayne O'Brien , Moises Goldszmidt

Training Green AI Models Using Elite Samples

The substantial increase in AI model training has considerable environmental implications, mandating more energy-efficient and sustainable AI practices. On the one hand, data-centric approaches show great potential towards training…

Machine Learning · Computer Science 2024-02-20 Mohammed Alswaitti , Roberto Verdecchia , Grégoire Danoy , Pascal Bouvry , Johnatan Pecero