English
Related papers

Related papers: Diversified Batch Selection for Training Accelerat…

200 papers

With the rising number of machine learning competitions, the world has witnessed an exciting race for the best algorithms. However, the involved data selection process may fundamentally suffer from evidence ambiguity and concept drift…

Machine Learning · Computer Science 2020-06-15 Hoang D. Nguyen , Xuan-Son Vu , Quoc-Tuan Truong , Duc-Trong Le

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…

Software Engineering · Computer Science 2021-03-10 Linghan Meng , Yanhui Li , Lin Chen , Zhi Wang , Di Wu , Yuming Zhou , Baowen Xu

The remarkable success of contrastive-learning-based multimodal models has been greatly driven by training on ever-larger datasets with expensive compute consumption. Sample selection as an alternative efficient paradigm plays an important…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Zihua Zhao , Feng Hong , Mengxi Chen , Pengyi Chen , Benyuan Liu , Jiangchao Yao , Ya Zhang , Yanfeng Wang

Diverse outputs in text generation are necessary for effective exploration in complex reasoning tasks, such as code generation and mathematical problem solving. Such Pass@$k$ problems benefit from distinct candidates covering the solution…

Computation and Language · Computer Science 2026-03-06 Sean Lamont , Christian Walder , Paul Montague , Amir Dezfouli , Michael Norrish

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry and pose the challenges of not having adequate computing resources and of high costs involved in human labeling efforts. Training data…

Computer Vision and Pattern Recognition · Computer Science 2018-05-30 Vishal Kaushal , Anurag Sahoo , Khoshrav Doctor , Narasimha Raju , Suyash Shetty , Pankaj Singh , Rishabh Iyer , Ganesh Ramakrishnan

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected…

Machine Learning · Computer Science 2024-03-28 Ao Zhou , Bin Liu , Jin Wang , Grigorios Tsoumakas

Model selection is a strategy aimed at creating accurate and robust models. A key challenge in designing these algorithms is identifying the optimal model for classifying any particular input sample. This paper addresses this challenge and…

Machine Learning · Computer Science 2023-05-22 James Kotary , Vincenzo Di Vito , Ferdinando Fioretto

Deep learning model effectiveness in classification tasks is often challenged by the quality and quantity of training data whenever they are affected by strong spurious correlations between specific attributes and target labels. This…

We study the problem of reducing the amount of labeled training data required to train supervised classification models. We approach it by leveraging Active Learning, through sequential selection of examples which benefit the model most.…

Machine Learning · Computer Science 2019-01-18 Fedor Zhdanov

Modern deep models are trained on large real-world datasets, where data quality varies and redundancy is common. Data-centric approaches such as dataset pruning have shown promise in improving training efficiency and model performance.…

Machine Learning · Computer Science 2025-07-18 Suorong Yang , Peijia Li , Yujie Liu , Zhiming Xu , Peng Ye , Wanli Ouyang , Furao Shen , Dongzhan Zhou

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety…

Machine Learning · Computer Science 2023-11-08 Zhijie Deng , Peng Cui , Jun Zhu

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched…

Dynamic data selection accelerates training by sampling a changing subset of the dataset while preserving accuracy. We rethink two core notions underlying sample evaluation: representativeness and diversity. Instead of local geometric…

Artificial Intelligence · Computer Science 2026-03-06 Yuzhe Zhou , Zhenglin Hua , Haiyun Guo , Yuheng Jia

Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the…

Machine Learning · Computer Science 2019-06-20 Erdem Bıyık , Kenneth Wang , Nima Anari , Dorsa Sadigh

Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more…

Machine Learning · Computer Science 2025-06-27 Suorong Yang , Peijia Li , Furao Shen , Jian Zhao

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Vishal Kaushal , Rishabh Iyer , Suraj Kothawade , Rohan Mahadev , Khoshrav Doctor , Ganesh Ramakrishnan

Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes…

Computation and Language · Computer Science 2024-09-18 Simon Yu , Liangyu Chen , Sara Ahmadian , Marzieh Fadaee

The constitutive behavior of materials is modeled through relationships between stress, strain, and possibly additional internal variables. This results in relatively high-dimensional feature spaces for machine learning models rendering the…

Computational Physics · Physics 2026-05-20 Ronak Shoghi , Lukas Morand , Dirk Helm , Alexander Hartmaier

In continual instruction tuning (CIT) scenarios, where new instruction tuning data continuously arrive in an online streaming manner, training delays from large-scale data significantly hinder real-time adaptation. Data selection can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Minjae Lee , Minhyuk Seo , Tingyu Qu , Tinne Tuytelaars , Jonghyun Choi

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based…

Machine Learning · Computer Science 2025-08-25 Sebastian Sanokowski , Sepp Hochreiter , Sebastian Lehner
‹ Prev 1 2 3 10 Next ›