English
Related papers

Related papers: GLISTER: Generalization based Data Subset Selectio…

200 papers

Large-scale datasets have been pivotal to the advancements of deep learning models in recent years, but training on such large datasets invariably incurs substantial storage and computational overhead. Meanwhile, real-world datasets often…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Suorong Yang , Peng Ye , Wanli Ouyang , Dongzhan Zhou , Furao Shen

Data Augmentation (DA) is known to improve the generalizability of deep neural networks. Most existing DA techniques naively add a certain number of augmented samples without considering the quality and the added computational cost of these…

Machine Learning · Computer Science 2022-03-18 Ehsan Kamalloo , Mehdi Rezagholizadeh , Ali Ghodsi

Deep learning models have been used to support analytics beyond simple aggregation, where deeper and wider models have been shown to yield great results. These models consume a huge amount of memory and computational operations. However,…

Machine Learning · Computer Science 2021-04-22 Shaofeng Cai , Gang Chen , Beng Chin Ooi , Jinyang Gao

We introduce a novel validation framework to measure the true robustness of learning models for real-world applications by creating source-inclusive and source-exclusive partitions in a dataset via clustering. We develop a robustness metric…

Machine Learning · Computer Science 2017-04-04 Ozsel Kilinc , Ismail Uysal

Deep learning models have demonstrated outstanding performance in several problems, but their training process tends to require immense amounts of computational and human resources for training and labeling, constraining the types of…

Machine Learning · Computer Science 2019-04-29 Toan Tran , Thanh-Toan Do , Ian Reid , Gustavo Carneiro

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets…

Machine Learning · Computer Science 2018-11-16 Matthew Klawonn , Eric Heim , James Hendler

Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different…

Machine Learning · Computer Science 2024-05-14 Donglin Zhan , James Anderson

Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes…

Computation and Language · Computer Science 2024-09-18 Simon Yu , Liangyu Chen , Sara Ahmadian , Marzieh Fadaee

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Data-efficient learning has garnered significant attention, especially given the current trend of large multi-modal models. Recently, dataset distillation has become an effective approach by synthesizing data samples that are essential for…

Machine Learning · Computer Science 2024-08-08 Yue Xu , Yong-Lu Li , Kaitong Cui , Ziyu Wang , Cewu Lu , Yu-Wing Tai , Chi-Keung Tang

While language models have shown remarkable performance across diverse tasks, they still encounter challenges in complex reasoning scenarios. Recent research suggests that language models trained on linearized search traces toward…

Artificial Intelligence · Computer Science 2025-10-28 Seungyong Moon , Bumsoo Park , Hyun Oh Song

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering algorithms are infeasible due to their high memory requirements and/or unfavorable runtime…

Data Structures and Algorithms · Computer Science 2026-01-27 Gregor Ulm , Simon Smith , Adrian Nilsson , Emil Gustavsson , Mats Jirstrand

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the…

Machine Learning · Computer Science 2026-03-19 Jenni Lampainen , Kaisa Joki , Napsu Karmitsa , Marko M. Mäkelä

A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it…

Machine Learning · Computer Science 2024-04-24 Robik Shrestha , Kushal Kafle , Christopher Kanan

The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored…

Machine Learning · Computer Science 2022-03-15 Cameron R. Wolfe , Jingkang Yang , Arindam Chowdhury , Chen Dun , Artun Bayer , Santiago Segarra , Anastasios Kyrillidis

Using huge training datasets can be costly and inconvenient. This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks. Inspired by recent ideas, we suggest…

Machine Learning · Computer Science 2022-03-17 Dmitry Medvedev , Alexander D'yakonov

Dataset distillation (DD) has emerged as a widely adopted technique for crafting a synthetic dataset that captures the essential information of a training dataset, facilitating the training of accurate neural models. Its applications span…

Machine Learning · Computer Science 2025-02-04 Saeed Vahidian , Mingyu Wang , Jianyang Gu , Vyacheslav Kungurtsev , Wei Jiang , Yiran Chen

To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly…

Machine Learning · Computer Science 2023-06-05 Yu Yang , Hao Kang , Baharan Mirzasoleiman

In many practical applications of learning algorithms, unlabeled data is cheap and abundant whereas labeled data is expensive. Active learning algorithms developed to achieve better performance with lower cost. Usually Representativeness…

Machine Learning · Computer Science 2016-08-26 Hossein Ghafarian , Hadi Sadoghi Yazdi
‹ Prev 1 2 3 10 Next ›