English
Related papers

Related papers: Improving Grammatical Error Correction via Context…

200 papers

Synthetic data generation is widely known to boost the accuracy of neural grammatical error correction (GEC) systems, but existing methods often lack diversity or are too simplistic to generate the broad range of grammatical errors made by…

Computation and Language · Computer Science 2021-05-28 Felix Stahlberg , Shankar Kumar

Due to the lack of parallel data in current Grammatical Error Correction (GEC) task, models based on Sequence to Sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can…

Computation and Language · Computer Science 2021-12-28 Liner Yang , Chencheng Wang , Yun Chen , Yongping Du , Erhong Yang

Grammar Error Correction(GEC) mainly relies on the availability of high quality of large amount of synthetic parallel data of grammatically correct and erroneous sentence pairs. The quality of the synthetic data is evaluated on how well the…

Computation and Language · Computer Science 2022-11-01 Vanya Bannihatti Kumar

Data sparsity is a well-known problem for grammatical error correction (GEC). Generating synthetic training data is one widely proposed solution to this problem, and has allowed models to achieve state-of-the-art (SOTA) performance in…

Computation and Language · Computer Science 2022-08-23 Chowdhury Rafeed Rahman

While there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate…

Computation and Language · Computer Science 2025-07-28 Penny Karanasou , Mengjie Qian , Stefano Bannò , Mark J. F. Gales , Kate M. Knill

We propose a novel data augmentation for labeled sentences called contextual augmentation. We assume an invariance that sentences are natural even if the words in the sentences are replaced with other words with paradigmatic relations. We…

Computation and Language · Computer Science 2018-05-17 Sosuke Kobayashi

In this paper, we explore the artificial generation of typographical errors based on real-world statistics. We first draw on a small set of annotated data to compute spelling error statistics. These are then invoked to introduce errors into…

Computation and Language · Computer Science 2020-05-05 Kshitij Shah , Gerard de Melo

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of…

Computation and Language · Computer Science 2021-01-12 Ping Yu , Ruiyi Zhang , Yang Zhao , Yizhe Zhang , Chunyuan Li , Changyou Chen

Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic…

Computation and Language · Computer Science 2023-11-21 Andrey Bout , Alexander Podolskiy , Sergey Nikolenko , Irina Piontkovskaya

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic…

Computation and Language · Computer Science 2018-10-02 Sudhanshu Kasewa , Pontus Stenetorp , Sebastian Riedel

Data Augmentation through generating pseudo data has been proven effective in mitigating the challenge of data scarcity in the field of Grammatical Error Correction (GEC). Various augmentation strategies have been widely explored, most of…

Computation and Language · Computer Science 2023-10-19 Jingheng Ye , Yinghui Li , Yangning Li , Hai-Tao Zheng

State-of-the-art models for keyphrase generation require large amounts of training data to achieve good performance. However, obtaining keyphrase-labeled documents can be challenging and costly. To address this issue, we present a…

Computation and Language · Computer Science 2024-11-07 Mael Houbre , Florian Boudin , Beatrice Daille , Akiko Aizawa

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a…

Computation and Language · Computer Science 2021-04-21 Eetu Sjöblom , Mathias Creutz , Teemu Vahtola

The challenges facing speech recognition systems, such as variations in pronunciations, adverse audio conditions, and the scarcity of labeled data, emphasize the necessity for a post-processing step that corrects recurring errors. Previous…

Computation and Language · Computer Science 2023-10-18 Tomer Wullach , Shlomo E. Chazan

Collecting and annotating datasets for pixel-level semantic segmentation tasks are highly labor-intensive. Data augmentation provides a viable solution by enhancing model generalization without additional real-world data collection.…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Huy Che , Dinh-Duy Phan , Duc-Khai Lam

In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We…

Computation and Language · Computer Science 2020-10-13 Steven Y. Feng , Varun Gangal , Dongyeop Kang , Teruko Mitamura , Eduard Hovy

We propose a training-free approach to improve sentence embeddings leveraging test-time compute by applying generative text models for data augmentation at inference time. Unlike conventional data augmentation that utilises synthetic…

Computation and Language · Computer Science 2025-09-09 Manuel Frank , Haithem Afli

A common and effective means for improving language model capabilities involves finetuning a ``student'' language model's parameters on generations from a more proficient ``teacher'' model. Termed ``synthetic data'', these generations are…

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training…

Computation and Language · Computer Science 2020-05-20 Jacob Andreas

Data augmentation techniques have been widely used to improve machine learning performance as they enhance the generalization capability of models. In this work, to generate high quality synthetic data for low-resource tagging tasks, we…

Computation and Language · Computer Science 2020-11-04 Bosheng Ding , Linlin Liu , Lidong Bing , Canasai Kruengkrai , Thien Hai Nguyen , Shafiq Joty , Luo Si , Chunyan Miao
‹ Prev 1 2 3 10 Next ›