Related papers: Improving short text classification through global…

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Data augmentation techniques are widely used in text classification tasks to improve the performance of classifiers, especially in low-resource scenarios. Most previous methods conduct text augmentation without considering the different…

Computation and Language · Computer Science 2022-09-07 Biyang Guo , Songqiao Han , Hailiang Huang

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

Mixup, a recent proposed data augmentation method through linearly interpolating inputs and modeling targets of random samples, has demonstrated its capability of significantly improving the predictive accuracy of the state-of-the-art…

Computation and Language · Computer Science 2019-05-23 Hongyu Guo , Yongyi Mao , Richong Zhang

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large…

Computation and Language · Computer Science 2024-04-02 Heng Yang , Ke Li

Short-Text Classification Using Unsupervised Keyword Expansion

Short-text classification, like all data science, struggles to achieve high performance using limited data. As a solution, a short sentence may be expanded with new and relevant feature words to form an artificially enlarged dataset, and…

Computation and Language · Computer Science 2019-09-18 Duncan Cameron-Steinke

Soft Contextual Data Augmentation for Neural Machine Translation

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for…

Computation and Language · Computer Science 2019-05-28 Jinhua Zhu , Fei Gao , Lijun Wu , Yingce Xia , Tao Qin , Wengang Zhou , Xueqi Cheng , Tie-Yan Liu

An Exploration of Multimodality and Data Augmentation for Dementia Classification

Dementia is a progressive neurological disorder that profoundly affects the daily lives of older adults, impairing abilities such as verbal communication and cognitive function. Early diagnosis is essential for enhancing both lifespan and…

Computational Engineering, Finance, and Science · Computer Science 2023-11-07 Kaiying Lin , Peter Washington

What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment…

Computation and Language · Computer Science 2021-09-02 Biyang Guo , Sonqiao Han , Hailiang Huang

Distributional Data Augmentation Methods for Low Resource Language

Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in…

Computation and Language · Computer Science 2023-09-12 Mosleh Mahamud , Zed Lee , Isak Samsten

Affect Enriched Word Embeddings for News Information Retrieval

Distributed representations of words have shown to be useful to improve the effectiveness of IR systems in many sub-tasks like query expansion, retrieval and ranking. Algorithms like word2vec, GloVe and others are also key factors in many…

Information Retrieval · Computer Science 2019-09-05 Tommaso Teofili , Niyati Chhaya

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

In this work we investigate the impact of applying textual data augmentation tasks to low resource machine translation. There has been recent interest in investigating approaches for training systems for languages with limited resources and…

Computation and Language · Computer Science 2023-06-14 Catherine Gitau , VUkosi Marivate

TextAug: Test time Text Augmentation for Multimodal Person Re-identification

Multimodal Person Reidentification is gaining popularity in the research community due to its effectiveness compared to counter-part unimodal frameworks. However, the bottleneck for multimodal deep learning is the need for a large volume of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Mulham Fawakherji , Eduard Vazquez , Pasquale Giampa , Binod Bhattarai

GenAug: Data Augmentation for Finetuning Text Generators

In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We…

Computation and Language · Computer Science 2020-10-13 Steven Y. Feng , Varun Gangal , Dongyeop Kang , Teruko Mitamura , Eduard Hovy

GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation

Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks…

Computation and Language · Computer Science 2021-11-19 Kang Min Yoo , Dongju Park , Jaewook Kang , Sang-Woo Lee , Woomyeong Park

Exploring Data Augmentation Methods on Social Media Corpora

Data augmentation has proven widely effective in computer vision. In Natural Language Processing (NLP) data augmentation remains an area of active research. There is no widely accepted augmentation technique that works well across tasks and…

Computation and Language · Computer Science 2023-03-07 Isabel Garcia Pietri , Kineret Stanley

RankAug: Augmented data ranking for text classification

Research on data generation and augmentation has been focused majorly on enhancing generation models, leaving a notable gap in the exploration and refinement of methods for evaluating synthetic data. There are several text similarity…

Computation and Language · Computer Science 2023-11-09 Tiasa Singha Roy , Priyam Basu

End-to-end Learning for Short Text Expansion

Effectively making sense of short texts is a critical task for many real world applications such as search engines, social media services, and recommender systems. The task is particularly challenging as a short text contains very sparse…

Computation and Language · Computer Science 2017-09-04 Jian Tang , Yue Wang , Kai Zheng , Qiaozhu Mei

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification. DoubleMix first leverages a couple of simple augmentation operations to…

Computation and Language · Computer Science 2022-09-13 Hui Chen , Wei Han , Diyi Yang , Soujanya Poria