Related papers: Does Data Augmentation Improve Generalization in N…

Data Augmentation as Feature Manipulation

Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain…

Machine Learning · Computer Science 2022-09-22 Ruoqi Shen , Sébastien Bubeck , Suriya Gunasekar

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed…

Machine Learning · Computer Science 2019-11-22 Zhuoxun He , Lingxi Xie , Xin Chen , Ya Zhang , Yanfeng Wang , Qi Tian

Towards Understanding Why Data Augmentation Improves Generalization

Data augmentation is a cornerstone technique in deep learning, widely used to improve model generalization. Traditional methods like random cropping and color jittering, as well as advanced techniques such as CutOut, Mixup, and CutMix, have…

Computer Vision and Pattern Recognition · Computer Science 2025-02-14 Jingyang Li , Jiachun Pan , Kim-Chuan Toh , Pan Zhou

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping,…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Luis Perez , Jason Wang

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Lexical Generalization Improves with Larger Models and Longer Training

While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We…

Computation and Language · Computer Science 2022-10-26 Elron Bandel , Yoav Goldberg , Yanai Elazar

Untapped Potential of Data Augmentation: A Domain Generalization Viewpoint

Data augmentation is a popular pre-processing trick to improve generalization accuracy. It is believed that by processing augmented inputs in tandem with the original ones, the model learns a more robust set of features which are shared…

Machine Learning · Computer Science 2020-07-10 Vihari Piratla , Shiv Shankar

Generalization Gap in Data Augmentation: Insights from Illumination

In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Jianqiang Xiao , Weiwen Guo , Junfeng Liu , Mengze Li

Boosting Model Resilience via Implicit Adversarial Data Augmentation

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To…

Machine Learning · Computer Science 2024-06-04 Xiaoling Zhou , Wei Ye , Zhemg Lee , Rui Xie , Shikun Zhang

Improving Deep Learning using Generic Data Augmentation

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating…

Machine Learning · Computer Science 2017-08-22 Luke Taylor , Geoff Nitschke

Data Augmentation for Neural NLP

Data scarcity is a problem that occurs in languages and tasks where we do not have large amounts of labeled data but want to use state-of-the-art models. Such models are often deep learning models that require a significant amount of data…

Computation and Language · Computer Science 2023-02-23 Domagoj Pluščec , Jan Šnajder

Feature Augmentations for High-Dimensional Learning

High-dimensional measurements are often correlated which motivates their approximation by factor models. This holds also true when features are engineered via low-dimensional interactions or kernel tricks. This often results in over…

Applications · Statistics 2025-09-03 Xiaonan Zhu , Bingyan Wang , Jianqing Fan

Further advantages of data augmentation on convolutional neural networks

Data augmentation is a popular technique largely used to enhance the training of convolutional neural networks. Although many of its benefits are well known by deep learning researchers and practitioners, its implicit regularization…

Computer Vision and Pattern Recognition · Computer Science 2019-06-27 Alex Hernández-García , Peter König

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen…

Machine Learning · Computer Science 2020-06-08 Raphael Gontijo-Lopes , Sylvia J. Smullin , Ekin D. Cubuk , Ethan Dyer

Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness

Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other…

Computation and Language · Computer Science 2022-11-28 Abdelrahman Zayed , Prasanna Parthasarathi , Goncalo Mordido , Hamid Palangi , Samira Shabanian , Sarath Chandar

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

Optimization of image transformation functions for the purpose of data augmentation has been intensively studied. In particular, adversarial data augmentation strategies, which search augmentation maximizing task loss, show significant…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Teppei Suzuki

Learning Data Augmentation Strategies for Object Detection

Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection.…

Computer Vision and Pattern Recognition · Computer Science 2019-06-27 Barret Zoph , Ekin D. Cubuk , Golnaz Ghiasi , Tsung-Yi Lin , Jonathon Shlens , Quoc V. Le

Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures

Existing NLP datasets contain various biases, and models tend to quickly learn those biases, which in turn limits their robustness. Existing approaches to improve robustness against dataset biases mostly focus on changing the training…

Computation and Language · Computer Science 2020-10-26 Nafise Sadat Moosavi , Marcel de Boer , Prasetya Ajie Utama , Iryna Gurevych

On the Benefits of Invariance in Neural Networks

Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of…

Machine Learning · Computer Science 2020-05-04 Clare Lyle , Mark van der Wilk , Marta Kwiatkowska , Yarin Gal , Benjamin Bloem-Reddy

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant…

Computation and Language · Computer Science 2021-06-15 Jiaao Chen , Derek Tam , Colin Raffel , Mohit Bansal , Diyi Yang