Related papers: MIXCODE: Enhancing Code Classification by Mixup-Ba…

Boosting Source Code Learning with Text-Oriented Data Augmentation: An Empirical Study

Recent studies have demonstrated remarkable advancements in source code learning, which applies deep neural networks (DNNs) to tackle various software engineering tasks. Similar to other DNN-based domains, source code learning also requires…

Software Engineering · Computer Science 2025-02-07 Zeming Dong , Qiang Hu , Yuejun Guo , Zhenya Zhang , Maxime Cordy , Mike Papadakis , Yves Le Traon , Jianjun Zhao

On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks

Mixup~\cite{zhang2017mixup} is a recently proposed method for training deep neural networks where additional samples are generated during training by convexly combining random pairs of images and their associated labels. While simple to…

Machine Learning · Statistics 2020-01-08 Sunil Thulasidasan , Gopinath Chennupati , Jeff Bilmes , Tanmoy Bhattacharya , Sarah Michalak

A Survey on Mixup Augmentations and Beyond

As Deep Neural Networks have achieved thrilling breakthroughs in the past decade, data augmentations have garnered increasing attention as regularization techniques when massive labeled data are unavailable. Among existing augmentations,…

Machine Learning · Computer Science 2025-04-24 Xin Jin , Hongyu Zhu , Siyuan Li , Zedong Wang , Zicheng Liu , Juanxi Tian , Chang Yu , Huafeng Qin , Stan Z. Li

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this…

Computation and Language · Computer Science 2020-11-12 Lichao Sun , Congying Xia , Wenpeng Yin , Tingting Liang , Philip S. Yu , Lifang He

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization…

Machine Learning · Computer Science 2023-10-17 Yingtian Zou , Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Source Code Data Augmentation for Deep Learning: A Survey

The increasingly popular adoption of deep learning models in many critical source code tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and…

Computation and Language · Computer Science 2023-11-14 Terry Yue Zhuo , Zhou Yang , Zhensu Sun , Yufei Wang , Li Li , Xiaoning Du , Zhenchang Xing , David Lo

A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data…

Machine Learning · Computer Science 2024-06-05 Chengtai Cao , Fan Zhou , Yurou Dai , Jianping Wang , Kunpeng Zhang

A Study on Mixup-Inspired Augmentation Methods for Software Vulnerability Detection

Various deep learning (DL) methods have recently been utilized to detect software vulnerabilities. Real-world software vulnerability datasets are rare and hard to acquire, as there is no simple metric for classifying vulnerability. Such…

Software Engineering · Computer Science 2025-04-29 Seyed Shayan Daneshvar , Da Tan , Shaowei Wang , Carson Leung

MixAugment & Mixup: Augmentation Methods for Facial Expression Recognition

Automatic Facial Expression Recognition (FER) has attracted increasing attention in the last 20 years since facial expressions play a central role in human communication. Most FER methodologies utilize Deep Neural Networks (DNNs) that are…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Andreas Psaroudakis , Dimitrios Kollias

DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are…

Machine Learning · Computer Science 2023-11-03 Wenxuan Bao , Francesco Pittaluga , Vijay Kumar B G , Vincent Bindschaedler

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

Empirical Study of Mix-based Data Augmentation Methods in Physiological Time Series Data

Data augmentation is a common practice to help generalization in the procedure of deep model training. In the context of physiological time series classification, previous research has primarily focused on label-invariant data augmentation…

Machine Learning · Computer Science 2023-09-19 Peikun Guo , Huiyuan Yang , Akane Sano

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding…

Computation and Language · Computer Science 2021-02-24 Wancong Zhang , Ieshan Vaidya

High-quality data augmentation for code comment classification

Code comments serve a crucial role in software development for documenting functionality, clarifying design choices, and assisting with issue tracking. They capture developers' insights about the surrounding source code, serving as an…

Software Engineering · Computer Science 2026-01-28 Thomas Borsani , Andrea Rosani , Giuseppe Di Fatta

Data Augmentation by Selecting Mixed Classes Considering Distance Between Classes

Data augmentation is an essential technique for improving recognition accuracy in object recognition using deep learning. Methods that generate mixed data from multiple data sets, such as mixup, can acquire new diversity that is not…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Shungo Fujii , Yasunori Ishii , Kazuki Kozuka , Tsubasa Hirakawa , Takayoshi Yamashita , Hironobu Fujiyoshi

Analyzing Effects of Mixed Sample Data Augmentation on Model Interpretability

Mixed sample data augmentation strategies are actively used when training deep neural networks (DNNs). Recent studies suggest that they are effective at various tasks. However, the impact of mixed sample data augmentation on model…

Machine Learning · Computer Science 2025-06-18 Soyoun Won , Sung-Ho Bae , Seong Tae Kim

GenCode: A Generic Data Augmentation Framework for Boosting Deep Learning-Based Code Understanding

Pre-trained code models lead the era of code intelligence, with multiple models designed with impressive performance. However, one important problem, data augmentation for code data that automatically helps developers prepare training data…

Software Engineering · Computer Science 2026-01-29 Zeming Dong , Qiang Hu , Xiaofei Xie , Maxime Cordy , Mike Papadakis , Yves Le Traon , Jianjun Zhao

Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models

Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Shentong Mo , Zhun Sun , Chao Li

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in…

Computation and Language · Computer Science 2022-06-28 Bohan Li , Yutai Hou , Wanxiang Che