English
Related papers

Related papers: Controllable Data Synthesis Method for Grammatical…

200 papers

Synthetic data generation is widely known to boost the accuracy of neural grammatical error correction (GEC) systems, but existing methods often lack diversity or are too simplistic to generate the broad range of grammatical errors made by…

Computation and Language · Computer Science 2021-05-28 Felix Stahlberg , Shankar Kumar

Grammar Error Correction(GEC) mainly relies on the availability of high quality of large amount of synthetic parallel data of grammatically correct and erroneous sentence pairs. The quality of the synthetic data is evaluated on how well the…

Computation and Language · Computer Science 2022-11-01 Vanya Bannihatti Kumar

Data sparsity is a well-known problem for grammatical error correction (GEC). Generating synthetic training data is one widely proposed solution to this problem, and has allowed models to achieve state-of-the-art (SOTA) performance in…

Computation and Language · Computer Science 2022-08-23 Chowdhury Rafeed Rahman

Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase…

Computation and Language · Computer Science 2024-06-26 Yixuan Wang , Baoxin Wang , Yijun Liu , Qingfu Zhu , Dayong Wu , Wanxiang Che

We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models of different qualities (i.e., poor and good). The…

Computation and Language · Computer Science 2020-11-03 Wangchunshu Zhou , Tao Ge , Chang Mu , Ke Xu , Furu Wei , Ming Zhou

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic…

Computation and Language · Computer Science 2022-01-27 Xin Sun , Tao Ge , Shuming Ma , Jingjing Li , Furu Wei , Houfeng Wang

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a…

Computation and Language · Computer Science 2021-04-21 Eetu Sjöblom , Mathias Creutz , Teemu Vahtola

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic…

Computation and Language · Computer Science 2018-10-02 Sudhanshu Kasewa , Pontus Stenetorp , Sebastian Riedel

Model collapse in synthetic data indicates that iterative training on self-generated data leads to a gradual decline in performance. With the proliferation of AI models, synthetic data will fundamentally reshape the web data ecosystem.…

Computation and Language · Computer Science 2025-05-29 Xuekai Zhu , Daixuan Cheng , Hengli Li , Kaiyan Zhang , Ermo Hua , Xingtai Lv , Ning Ding , Zhouhan Lin , Zilong Zheng , Bowen Zhou

This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second…

Computation and Language · Computer Science 2022-08-10 Sascha Rothe , Jonathan Mallinson , Eric Malmi , Sebastian Krause , Aliaksei Severyn

Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic…

Computation and Language · Computer Science 2023-11-21 Andrey Bout , Alexander Podolskiy , Sergey Nikolenko , Irina Piontkovskaya

Grammatical Error Detection (GED) methods rely heavily on human annotated error corpora. However, these annotations are unavailable in many low-resource languages. In this paper, we investigate GED in this context. Leveraging the zero-shot…

Computation and Language · Computer Science 2024-07-17 Gaetan Lopez Latouche , Marc-André Carbonneau , Ben Swanson

Grammatical Error Correction (GEC) aims to automatically detect and correct grammatical errors. In this aspect, dominant models are trained by one-iteration learning while performing multiple iterations of corrections during inference.…

Computation and Language · Computer Science 2022-03-18 Shaopeng Lai , Qingyu Zhou , Jiali Zeng , Zhongli Li , Chao Li , Yunbo Cao , Jinsong Su

Data-centric AI approach aims to enhance the model performance without modifying the model and has been shown to impact model performance positively. While recent attention has been given to data-centric AI based on synthetic data, due to…

Computation and Language · Computer Science 2023-06-27 Chanjun Park , Seonmin Koo , Seolhwa Lee , Jaehyung Seo , Sugyeong Eo , Hyeonseok Moon , Heuiseok Lim

Synthetic data is a standard component in training large language models, yet systematic comparisons across design dimensions, including rephrasing strategy, generator model, and source data, remain absent. We conduct extensive controlled…

Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Developing such tools requires a large amount of parallel, annotated data, which is unavailable for most languages.…

Computation and Language · Computer Science 2023-09-21 Atakan Kara , Farrin Marouf Sofian , Andrew Bond , Gözde Gül Şahin

Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive and false negative rates. A common approach to…

Machine Learning · Statistics 2026-02-17 Pengfei Lyu , Zhengchi Ma , Linjun Zhang , Anru R. Zhang

In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Vahid Noroozi , Zhehuai Chen , Somshubra Majumdar , Steve Huang , Jagadeesh Balam , Boris Ginsburg

Grammatical error correction (GEC) is the task of detecting and correcting grammatical errors in texts written by second language learners. The statistical machine translation (SMT) approach to GEC, in which sentences written by second…

Computation and Language · Computer Science 2016-06-02 Duc Tam Hoang , Shamil Chollampatt , Hwee Tou Ng

In recent years, sequence-to-sequence models have been very effective for end-to-end grammatical error correction (GEC). As creating human-annotated parallel corpus for GEC is expensive and time-consuming, there has been work on artificial…

Computation and Language · Computer Science 2019-07-23 Phu Mon Htut , Joel Tetreault
‹ Prev 1 2 3 10 Next ›