English
Related papers

Related papers: Exploring Multilingual Text Data Distillation

200 papers

Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to…

Computation and Language · Computer Science 2021-04-20 Yongqi Li , Wenjie Li

Dataset distillation aims to compress a training dataset by creating a small number of informative synthetic samples such that neural networks trained on them perform as well as those trained on the original training dataset. Current text…

Computation and Language · Computer Science 2024-04-02 Aru Maekawa , Satoshi Kosugi , Kotaro Funakoshi , Manabu Okumura

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

In the vision domain, dataset distillation arises as a technique to condense a large dataset into a smaller synthetic one that exhibits a similar result in the training process. While image data presents an extensive literature of…

The popularity of deep learning has led to the curation of a vast number of massive and multifarious datasets. Despite having close-to-human performance on individual tasks, training parameter-hungry models on large datasets poses…

Machine Learning · Computer Science 2023-09-27 Noveen Sachdeva , Julian McAuley

Recent advances in multimodal learning have achieved remarkable success across diverse vision-language tasks. However, such progress heavily relies on large-scale image-text datasets, making training costly and inefficient. Prior efforts in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Junhyeok Choi , Sangwoo Mo , Minwoo Chae

Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical.…

Machine Learning · Computer Science 2026-03-31 Yuri Kinoshita , Naoki Nishikawa , Taro Toyoizumi

Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Ahmad Sajedi , Samir Khaki , Lucy Z. Liu , Ehsan Amjadian , Yuri A. Lawryshyn , Konstantinos N. Plataniotis

The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-13 Ali Abbasi , Ashkan Shahbazi , Hamed Pirsiavash , Soheil Kolouri

In the realm of large language model (LLM), as the size of large models increases, it also brings higher training costs. There is a urgent need to minimize the data size in LLM training. Compared with data selection method, the data…

Computation and Language · Computer Science 2025-04-25 Rong Yao , Hailin Hu , Yifei Fu , Hanting Chen , Wenyi Fang , Fanyi Du , Kai Han , Yunhe Wang

Dataset distillation, a pragmatic approach in machine learning, aims to create a smaller synthetic dataset from a larger existing dataset. However, existing distillation methods primarily adopt a model-based paradigm, where the synthetic…

Machine Learning · Computer Science 2024-02-21 Binglin Zhou , Linhao Zhong , Wentao Chen

Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 George Cazenavette , Tongzhou Wang , Antonio Torralba , Alexei A. Efros , Jun-Yan Zhu

Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density,…

Dataset distillation is a method for reducing dataset sizes by learning a small number of synthetic samples containing all the information of a large dataset. This has several benefits like speeding up model training, reducing energy…

Machine Learning · Computer Science 2022-06-10 Ilia Sucholutsky , Matthias Schonlau

Data distillation is the problem of reducing the volume oftraining data while keeping only the necessary information. With thispaper, we deeper explore the new data distillation algorithm, previouslydesigned for image data. Our experiments…

Machine Learning · Computer Science 2020-10-21 Dmitry Medvedev , Alexander D'yakonov

Recent years have witnessed the remarkable success of deep learning in remote sensing image interpretation, driven by the availability of large-scale benchmark datasets. However, this reliance on massive training data also brings two major…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Yonghao Xu , Pedram Ghamisi , Qihao Weng

Dataset distillation (DD) condenses large datasets into compact yet informative substitutes, preserving performance comparable to the original dataset while reducing storage, transmission costs, and computational consumption. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Yawen Zou , Guang Li , Duo Su , Zi Wang , Jun Yu , Chao Zhang

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving. However, traditional multilingual translation usually…

Computation and Language · Computer Science 2019-05-01 Xu Tan , Yi Ren , Di He , Tao Qin , Zhou Zhao , Tie-Yan Liu

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used…

Machine Learning · Computer Science 2024-07-23 William Yang , Ye Zhu , Zhiwei Deng , Olga Russakovsky

Over the past year, the emergence of transfer learning with large-scale language models (LM) has led to dramatic performance improvements across a broad range of natural language understanding tasks. However, the size and memory footprint…

Computation and Language · Computer Science 2020-02-04 Luke Melas-Kyriazi , George Han , Celine Liang
‹ Prev 1 2 3 10 Next ›