English
Related papers

Related papers: Good-Enough Example Extrapolation

200 papers

In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is to perform data augmentation,…

Computation and Language · Computer Science 2021-02-03 Kenton Lee , Kelvin Guu , Luheng He , Tim Dozat , Hyung Won Chung

Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks…

Computation and Language · Computer Science 2021-11-19 Kang Min Yoo , Dongju Park , Jaewook Kang , Sang-Woo Lee , Woomyeong Park

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training…

Computation and Language · Computer Science 2020-05-20 Jacob Andreas

In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these…

Computer Vision and Pattern Recognition · Computer Science 2019-01-23 Cecilia Summers , Michael J. Dinneen

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based…

Methodology · Statistics 2024-07-08 Xinwei Shen , Nicolai Meinshausen

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification. DoubleMix first leverages a couple of simple augmentation operations to…

Computation and Language · Computer Science 2022-09-13 Hui Chen , Wei Han , Diyi Yang , Soujanya Poria

Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other…

Computation and Language · Computer Science 2022-11-28 Abdelrahman Zayed , Prasanna Parthasarathi , Goncalo Mordido , Hamid Palangi , Samira Shabanian , Sarath Chandar

Extrapolation from a source to a target, e.g., from adults to children, is a promising approach to utilizing external information when data are sparse. In the context of meta-analysis, one is commonly faced with a small number of studies,…

Methodology · Statistics 2019-01-21 Christian Röver , Simon Wandel , Tim Friede

The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss,…

Machine Learning · Computer Science 2023-03-17 Guangxiang Zhao , Wenkai Yang , Xuancheng Ren , Lei Li , Yunfang Wu , Xu Sun

Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins,…

Machine Learning · Computer Science 2021-10-27 Alvin Chan , Ali Madani , Ben Krause , Nikhil Naik

Deep learning-based text classification models need abundant labeled data to obtain competitive performance. Unfortunately, annotating large-size corpus is time-consuming and laborious. To tackle this, multiple researches try to use data…

Computation and Language · Computer Science 2023-02-03 Xiaotian Lin , Nankai Lin , Yingwen Fu , Ziyu Yang , Shengyi Jiang

Text-to-image generation requires large amount of training data to synthesizing high-quality images. For augmenting training data, previous methods rely on data interpolations like cropping, flipping, and mixing up, which fail to introduce…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Senmao Ye , Fei Liu

Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant…

Computation and Language · Computer Science 2023-01-10 Aleksandra Edwards , Asahi Ushio , Jose Camacho-Collados , Hélène de Ribaupierre , Alun Preece

Data scarcity and data imbalance have attracted a lot of attention in many fields. Data augmentation, explored as an effective approach to tackle them, can improve the robustness and efficiency of classification models by generating new…

Computation and Language · Computer Science 2024-12-11 Le Li , Jiale Wei , Pai Peng , Qiyuan Chen , Benjamin Guedj , Bo Cai

We consider the problem of modeling the effects of perturbations like gene knockouts on measurements such as single-cell RNA counts. Given data for some perturbations, we aim to predict the distribution of measurements for new combinations…

Training of generative models especially Generative Adversarial Networks can easily diverge in low-data setting. To mitigate this issue, we propose a novel implicit data augmentation approach which facilitates stable training and synthesize…

Computer Vision and Pattern Recognition · Computer Science 2022-07-15 Mengyu Dai , Haibin Hang , Xiaoyang Guo

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

It is impossible today to pretend that the practice of machine learning is always compatible with the idea that training and testing data follow the same distribution. Several authors have recently used ensemble techniques to show how…

Machine Learning · Computer Science 2025-03-03 Jianyu Zhang , Léon Bottou

This paper aims to address the challenge of data generation beyond the training data and proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data-generating process. We…

Machine Learning · Computer Science 2026-05-15 Kun Zhang , Jiaqi Sun , Yiqing Li , Ignavier Ng , Namrata Deka , Shaoan Xie

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings…

‹ Prev 1 2 3 10 Next ›