English
Related papers

Related papers: Data Augmentation for Sample Efficient and Robust …

200 papers

Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even…

Information Retrieval · Computer Science 2022-07-08 Abhijit Anand , Jurek Leonhardt , Koustav Rudra , Avishek Anand

We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner. Inspired by recent contrastive self-supervised learning algorithms used for image and NLP pretraining,…

Computation and Language · Computer Science 2021-03-29 Dongsheng Luo , Wei Cheng , Jingchao Ni , Wenchao Yu , Xuchao Zhang , Bo Zong , Yanchi Liu , Zhengzhang Chen , Dongjin Song , Haifeng Chen , Xiang Zhang

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To…

Machine Learning · Computer Science 2024-06-04 Xiaoling Zhou , Wei Ye , Zhemg Lee , Rui Xie , Shikun Zhang

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

Data augmentation (DA) has been widely investigated to facilitate model optimization in many tasks. However, in most cases, data augmentation is randomly performed for each training sample with a certain probability, which might incur…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Shiqi Lin , Zhizheng Zhang , Xin Li , Wenjun Zeng , Zhibo Chen

Sequential recommender systems have recently achieved significant performance improvements with the exploitation of deep learning (DL) based methods. However, although various DL-based methods have been introduced, most of them only focus…

Information Retrieval · Computer Science 2022-03-29 Joo-yeong Song , Bongwon Suh

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Dynamic data selection aims to accelerate training with lossless performance. However, reducing training data inherently limits data diversity, potentially hindering generalization. While data augmentation is widely used to enhance…

Machine Learning · Computer Science 2025-05-13 Suorong Yang , Peng Ye , Furao Shen , Dongzhan Zhou

Compact dual-encoder models are widely used for retrieval owing to their efficiency and scalability. However, such models often underperform compared to their Large Language Model (LLM)-based retrieval counterparts, likely due to their…

Information Retrieval · Computer Science 2025-09-23 Pranjal A. Chitale , Bishal Santra , Yashoteja Prabhu , Amit Sharma

Conventional image classifiers are trained by randomly sampling mini-batches of images. To achieve state-of-the-art performance, practitioners use sophisticated data augmentation schemes to expand the amount of training data available for…

Machine Learning · Computer Science 2021-06-23 Renkun Ni , Micah Goldblum , Amr Sharaf , Kezhi Kong , Tom Goldstein

In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not…

Machine Learning · Computer Science 2022-08-03 Joao Fonseca , Fernando Bacao

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

Self-supervised contrastive learning is among the recent representation learning methods that have shown performance gains in several downstream tasks including semantic segmentation. This paper evaluates strong data augmentation, one of…

Image and Video Processing · Electrical Eng. & Systems 2025-12-11 Azeez Idris , Abdurahman Ali Mohammed , Samuel Fanijo

We introduce style augmentation, a new form of data augmentation based on random style transfer, for improving the robustness of convolutional neural networks (CNN) over both classification and regression based tasks. During training, our…

Computer Vision and Pattern Recognition · Computer Science 2019-04-15 Philip T. Jackson , Amir Atapour-Abarghouei , Stephen Bonner , Toby Breckon , Boguslaw Obara

Data augmentation has shown its effectiveness in resolving the data-hungry problem and improving model's generalization ability. However, the quality of augmented data can be varied, especially compared with the raw/original data. To boost…

Computation and Language · Computer Science 2024-09-27 Guanyi Mou , Yichuan Li , Kyumin Lee

Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Salah Zaiem , Titouan Parcollet , Slim Essid

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Building high-quality datasets and labeling query-document relevance are essential yet resource-intensive tasks, requiring detailed guidelines and substantial effort from human annotators. This paper explores the use of small, fine-tuned…

Information Retrieval · Computer Science 2025-04-15 Quentin Fitte-Rey , Matyas Amrouche , Romain Deveaud

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends…

Computation and Language · Computer Science 2020-10-20 Yanru Qu , Dinghan Shen , Yelong Shen , Sandra Sajeev , Jiawei Han , Weizhu Chen

With promising empirical performance across a wide range of applications, synthetic data augmentation appears a viable solution to data scarcity and the demands of increasingly data-intensive models. Its effectiveness lies in expanding the…

Machine Learning · Computer Science 2026-02-02 Zixuan Wu , So Won Jeong , Yating Liu , Yeo Jin Jung , Claire Donnat
‹ Prev 1 2 3 10 Next ›