Related papers: Data Augmentation for Sample Efficient and Robust …

Supervised Contrastive Learning Approach for Contextual Ranking

Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even…

Information Retrieval · Computer Science 2022-07-08 Abhijit Anand , Jurek Leonhardt , Koustav Rudra , Avishek Anand

Unsupervised Document Embedding via Contrastive Augmentation

We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner. Inspired by recent contrastive self-supervised learning algorithms used for image and NLP pretraining,…

Computation and Language · Computer Science 2021-03-29 Dongsheng Luo , Wei Cheng , Jingchao Ni , Wenchao Yu , Xuchao Zhang , Bo Zong , Yanchi Liu , Zhengzhang Chen , Dongjin Song , Haifeng Chen , Xiang Zhang

Boosting Model Resilience via Implicit Adversarial Data Augmentation

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To…

Machine Learning · Computer Science 2024-06-04 Xiaoling Zhou , Wei Ye , Zhemg Lee , Rui Xie , Shikun Zhang

Diversity-oriented Data Augmentation with Large Language Models

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation

Data augmentation (DA) has been widely investigated to facilitate model optimization in many tasks. However, in most cases, data augmentation is randomly performed for each training sample with a certain probability, which might incur…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Shiqi Lin , Zhizheng Zhang , Xin Li , Wenjun Zeng , Zhibo Chen

Data Augmentation Strategies for Improving Sequential Recommender Systems

Sequential recommender systems have recently achieved significant performance improvements with the exploitation of deep learning (DL) based methods. However, although various DL-based methods have been introduced, most of them only focus…

Information Retrieval · Computer Science 2022-03-29 Joo-yeong Song , Bongwon Suh

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

When Dynamic Data Selection Meets Data Augmentation

Dynamic data selection aims to accelerate training with lossless performance. However, reducing training data inherently limits data diversity, potentially hindering generalization. While data augmentation is widely used to enhance…

Machine Learning · Computer Science 2025-05-13 Suorong Yang , Peng Ye , Furao Shen , Dongzhan Zhou

Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval

Compact dual-encoder models are widely used for retrieval owing to their efficiency and scalability. However, such models often underperform compared to their Large Language Model (LLM)-based retrieval counterparts, likely due to their…

Information Retrieval · Computer Science 2025-09-23 Pranjal A. Chitale , Bishal Santra , Yashoteja Prabhu , Amit Sharma

Data Augmentation for Meta-Learning

Conventional image classifiers are trained by randomly sampling mini-batches of images. To achieve state-of-the-art performance, practitioners use sophisticated data augmentation schemes to expand the amount of training data available for…

Machine Learning · Computer Science 2021-06-23 Renkun Ni , Micah Goldblum , Amr Sharaf , Kezhi Kong , Tom Goldstein

Research Trends and Applications of Data Augmentation Algorithms

In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not…

Machine Learning · Computer Science 2022-08-03 Joao Fonseca , Fernando Bacao

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

Stronger is not better: Better Augmentations in Contrastive Learning for Medical Image Segmentation

Self-supervised contrastive learning is among the recent representation learning methods that have shown performance gains in several downstream tasks including semantic segmentation. This paper evaluates strong data augmentation, one of…

Image and Video Processing · Electrical Eng. & Systems 2025-12-11 Azeez Idris , Abdurahman Ali Mohammed , Samuel Fanijo

Style Augmentation: Data Augmentation via Style Randomization

We introduce style augmentation, a new form of data augmentation based on random style transfer, for improving the robustness of convolutional neural networks (CNN) over both classification and regression based tasks. During training, our…

Computer Vision and Pattern Recognition · Computer Science 2019-04-15 Philip T. Jackson , Amir Atapour-Abarghouei , Stephen Bonner , Toby Breckon , Boguslaw Obara

Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

Data augmentation has shown its effectiveness in resolving the data-hungry problem and improving model's generalization ability. However, the quality of augmented data can be varied, especially compared with the raw/original data. To boost…

Computation and Language · Computer Science 2024-09-27 Guanyi Mou , Yichuan Li , Kyumin Lee

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Salah Zaiem , Titouan Parcollet , Slim Essid

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Augmented Relevance Datasets with Fine-Tuned Small LLMs

Building high-quality datasets and labeling query-document relevance are essential yet resource-intensive tasks, requiring detailed guidelines and substantial effort from human annotators. This paper explores the use of small, fine-tuned…

Information Retrieval · Computer Science 2025-04-15 Quentin Fitte-Rey , Matyas Amrouche , Romain Deveaud

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends…

Computation and Language · Computer Science 2020-10-20 Yanru Qu , Dinghan Shen , Yelong Shen , Sandra Sajeev , Jiawei Han , Weizhu Chen

Filtering with Confidence: When Data Augmentation Meets Conformal Prediction

With promising empirical performance across a wide range of applications, synthetic data augmentation appears a viable solution to data scarcity and the demands of increasingly data-intensive models. Its effectiveness lies in expanding the…

Machine Learning · Computer Science 2026-02-02 Zixuan Wu , So Won Jeong , Yating Liu , Yeo Jin Jung , Claire Donnat