Related papers: Good-Enough Example Extrapolation

Neural Data Augmentation via Example Extrapolation

In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is to perform data augmentation,…

Computation and Language · Computer Science 2021-02-03 Kenton Lee , Kelvin Guu , Luheng He , Tim Dozat , Hyung Won Chung

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks…

Computation and Language · Computer Science 2021-11-19 Kang Min Yoo , Dongju Park , Jaewook Kang , Sang-Woo Lee , Woomyeong Park

Good-Enough Compositional Data Augmentation

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training…

Computation and Language · Computer Science 2020-05-20 Jacob Andreas

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Engression: Extrapolation through the Lens of Distributional Regression

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based…

Methodology · Statistics 2024-07-08 Xinwei Shen , Nicolai Meinshausen

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification. DoubleMix first leverages a couple of simple augmentation operations to…

Computation and Language · Computer Science 2022-09-13 Hui Chen , Wei Han , Diyi Yang , Soujanya Poria

Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness

Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other…

Computation and Language · Computer Science 2022-11-28 Abdelrahman Zayed , Prasanna Parthasarathi , Goncalo Mordido , Hamid Palangi , Samira Shabanian , Sarath Chandar

Model averaging for robust extrapolation in evidence synthesis

Extrapolation from a source to a target, e.g., from adults to children, is a promising approach to utilizing external information when data are sparse. In the context of meta-analysis, one is commonly faced with a small number of studies,…

Methodology · Statistics 2019-01-21 Christian Röver , Simon Wandel , Tim Friede

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss,…

Machine Learning · Computer Science 2023-03-17 Guangxiang Zhao , Wenkai Yang , Xuancheng Ren , Lei Li , Yunfang Wu , Xu Sun

Deep Extrapolation for Attribute-Enhanced Generation

Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins,…

Machine Learning · Computer Science 2021-10-27 Alvin Chan , Ali Madani , Ben Krause , Nikhil Naik

How to choose "Good" Samples for Text Data Augmentation

Deep learning-based text classification models need abundant labeled data to obtain competitive performance. Unfortunately, annotating large-size corpus is time-consuming and laborious. To tackle this, multiple researches try to use data…

Computation and Language · Computer Science 2023-02-03 Xiaotian Lin , Nankai Lin , Yingwen Fu , Ziyu Yang , Shengyi Jiang

Data Extrapolation for Text-to-image Generation on Small Datasets

Text-to-image generation requires large amount of training data to synthesizing high-quality images. For augmenting training data, previous methods rely on data interpolations like cropping, flipping, and mixing up, which fail to introduce…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Senmao Ye , Fei Liu

Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant…

Computation and Language · Computer Science 2023-01-10 Aleksandra Edwards , Asahi Ushio , Jose Camacho-Collados , Hélène de Ribaupierre , Alun Preece

Reprint: a randomized extrapolation based on principal components for data augmentation

Data scarcity and data imbalance have attracted a lot of attention in many fields. Data augmentation, explored as an effective approach to tackle them, can improve the robustness and efficiency of classification models by generating new…

Computation and Language · Computer Science 2024-12-11 Le Li , Jiale Wei , Pai Peng , Qiyuan Chen , Benjamin Guedj , Bo Cai

Extrapolation Guarantees for Perturbation Modeling Under the Additive Latent Shift Assumption

We consider the problem of modeling the effects of perturbations like gene knockouts on measurements such as single-cell RNA counts. Given data for some perturbations, we aim to predict the distribution of measurements for new combinations…

Machine Learning · Statistics 2026-05-18 Julius von Kügelgen , Jakob Ketterer , Michael Vollenweider , Michael Scholkemper , Xinwei Shen , Nicolai Meinshausen , Jonas Peters

Adaptive Feature Interpolation for Low-Shot Image Generation

Training of generative models especially Generative Adversarial Networks can easily diverge in low-data setting. To mitigate this issue, we propose a novel implicit data augmentation approach which facilitates stable training and synthesize…

Computer Vision and Pattern Recognition · Computer Science 2022-07-15 Mengyu Dai , Haibin Hang , Xiaoyang Guo

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

Fine-tuning with Very Large Dropout

It is impossible today to pretend that the practice of machine learning is always compatible with the idea that training and testing data follow the same distribution. Several authors have recently used ensemble techniques to show how…

Machine Learning · Computer Science 2025-03-03 Jianyu Zhang , Léon Bottou

SEDGE: Structural Extrapolated Data Generation

This paper aims to address the challenge of data generation beyond the training data and proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data-generating process. We…

Machine Learning · Computer Science 2026-05-15 Kun Zhang , Jiaqi Sun , Yiqing Li , Ignavier Ng , Namrata Deka , Shaoan Xie

Distribution augmentation for low-resource expressive text-to-speech

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Mateusz Lajszczak , Animesh Prasad , Arent van Korlaar , Bajibabu Bollepalli , Antonio Bonafonte , Arnaud Joly , Marco Nicolis , Alexis Moinet , Thomas Drugman , Trevor Wood , Elena Sokolova