Related papers: Data Augmentation Method Utilizing Template Senten…

Data Augmentation Techniques for Process Extraction from Scientific Publications

We present data augmentation techniques for process extraction tasks in scientific publications. We cast the process extraction task as a sequence labeling task where we identify all the entities in a sentence and label them according to…

Computation and Language · Computer Science 2025-04-16 Yuni Susanti

Discovering Patterns of Definitions and Methods from Scientific Documents

The difficulties of automatic extraction of definitions and methods from scientific documents lie in two aspects: (1) the complexity and diversity of natural language texts, which requests an analysis method to support the discovery of…

Computation and Language · Computer Science 2023-07-06 Yutian Sun , Hai Zhuge

Variational Template Machine for Data-to-Text Generation

How to generate descriptions from structured data organized in tables? Existing approaches using neural encoder-decoder models often suffer from lacking diversity. We claim that an open set of templates is crucial for enriching the phrase…

Computation and Language · Computer Science 2020-02-14 Rong Ye , Wenxian Shi , Hao Zhou , Zhongyu Wei , Lei Li

An Augmentation Strategy for Visually Rich Documents

Many business workflows require extracting important fields from form-like documents (e.g. bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating this task work well only when trained with large datasets.…

Computation and Language · Computer Science 2022-12-23 Jing Xie , James B. Wendt , Yichao Zhou , Seth Ebner , Sandeep Tata

Leveraging Data Augmentation for Process Information Extraction

Business Process Modeling projects often require formal process models as a central component. High costs associated with the creation of such formal process models motivated many different fields of research aimed at automated generation…

Computation and Language · Computer Science 2024-04-12 Julian Neuberger , Leonie Doll , Benedict Engelmann , Lars Ackermann , Stefan Jablonski

Self-Compositional Data Augmentation for Scientific Keyphrase Generation

State-of-the-art models for keyphrase generation require large amounts of training data to achieve good performance. However, obtaining keyphrase-labeled documents can be challenging and costly. To address this issue, we present a…

Computation and Language · Computer Science 2024-11-07 Mael Houbre , Florian Boudin , Beatrice Daille , Akiko Aizawa

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

Selective Sampling of Effective Example Sentence Sets for Word Sense Disambiguation

This paper proposes an efficient example selection method for example-based word sense disambiguation systems. To construct a practical size database, a considerable overhead for manual sense disambiguation is required. Our method is…

cmp-lg · Computer Science 2008-02-03 Atsushi Fujii , Kentaro Inui , Takenobu Tokunaga , Hozumi Tanaka

VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling

In this paper, we tackle the task of definition modeling, where the goal is to learn to generate definitions of words and phrases. Existing approaches for this task are discriminative, combining distributional and lexical semantics in an…

Computation and Language · Computer Science 2020-10-08 Machel Reid , Edison Marrese-Taylor , Yutaka Matsuo

Transformers as Neural Augmentors: Class Conditional Sentence Generation via Variational Bayes

Data augmentation methods for Natural Language Processing tasks are explored in recent years, however they are limited and it is hard to capture the diversity on sentence level. Besides, it is not always possible to perform data…

Computation and Language · Computer Science 2022-05-20 M. Şafak Bilici , Mehmet Fatih Amasyali

Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models

Legal texts routinely use concepts that are difficult to understand. Lawyers elaborate on the meaning of such concepts by, among other things, carefully investigating how have they been used in past. Finding text snippets that mention a…

Computation and Language · Computer Science 2021-12-15 Jaromir Savelka , Kevin D. Ashley

Soft Contextual Data Augmentation for Neural Machine Translation

While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for…

Computation and Language · Computer Science 2019-05-28 Jinhua Zhu , Fei Gao , Lijun Wu , Yingce Xia , Tao Qin , Wengang Zhou , Xueqi Cheng , Tie-Yan Liu

Syntax-aware Data Augmentation for Neural Machine Translation

Data augmentation is an effective performance enhancement in neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data augmentation enhancement strategy for neural machine translation.…

Computation and Language · Computer Science 2020-04-30 Sufeng Duan , Hai Zhao , Dongdong Zhang , Rui Wang

A New Sentence Extraction Strategy for Unsupervised Extractive Summarization Methods

In recent years, text summarization methods have attracted much attention again thanks to the researches on neural network models. Most of the current text summarization methods based on neural network models are supervised methods which…

Computation and Language · Computer Science 2024-01-25 Dehao Tao , Yingzhu Xiong , Zhongliang Yang , Yongfeng Huang

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve…

Computation and Language · Computer Science 2022-07-25 Markus Bayer , Marc-André Kaufhold , Björn Buchhold , Marcel Keller , Jörg Dallmeyer , Christian Reuter

Improving Term Extraction with Terminological Resources

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional…

Computation and Language · Computer Science 2007-05-23 Sophie Aubin , Thierry Hamon

Extracting Definienda in Mathematical Scholarly Articles with Transformers

We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as…

Artificial Intelligence · Computer Science 2023-11-22 Shufan Jiang , Pierre Senellart

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Bag of Tricks for Training Data Extraction from Language Models

With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty of…

Computation and Language · Computer Science 2023-06-02 Weichen Yu , Tianyu Pang , Qian Liu , Chao Du , Bingyi Kang , Yan Huang , Min Lin , Shuicheng Yan

Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

While the abundance of rich and vast datasets across numerous fields has facilitated the advancement of natural language processing, sectors in need of specialized data types continue to struggle with the challenge of finding quality data.…

Computation and Language · Computer Science 2026-02-06 Hyeonseok Kang , Hyein Seo , Jeesu Jung , Sangkeun Jung , Du-Seong Chang , Riwoo Chung