English
Related papers

Related papers: Towards Generating Automatic Anaphora Annotations

200 papers

Annotation of political discourse is resource-intensive, but recent developments in NLP promise to automate complex annotation tasks. Fine-tuned transformer-based models outperform human annotators in some annotation tasks, but they require…

Computation and Language · Computer Science 2024-08-27 Sebastian Haunss , André Blessing

Data annotation is an essential step for constructing new datasets. However, the conventional approach of data annotation through crowdsourcing is both time-consuming and expensive. In addition, the complexity of this process increases when…

Computation and Language · Computer Science 2024-02-09 Juhwan Choi , Eunju Lee , Kyohoon Jin , YoungBin Kim

Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the…

Computation and Language · Computer Science 2021-09-09 Leonardo F. R. Ribeiro , Jonas Pfeiffer , Yue Zhang , Iryna Gurevych

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processsing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for…

Computation and Language · Computer Science 2022-09-01 Johann Frei , Frank Kramer

The NLP community has long advocated for the construction of multi-annotator datasets to better capture the nuances of language interpretation, subjectivity, and ambiguity. This paper conducts a retrospective study to show how performance…

Computation and Language · Computer Science 2023-10-24 Pritam Kadasi , Mayank Singh

While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that…

Computation and Language · Computer Science 2022-03-25 Linyi Yang , Jiazheng Li , Pádraig Cunningham , Yue Zhang , Barry Smyth , Ruihai Dong

Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior…

Computation and Language · Computer Science 2023-11-01 Ruoyu Zhang , Yanzeng Li , Yongliang Ma , Ming Zhou , Lei Zou

The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver…

Computation and Language · Computer Science 2024-03-22 Mitja Nikolaus , Abhishek Agrawal , Petros Kaklamanis , Alex Warstadt , Abdellah Fourtassi

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot…

Computation and Language · Computer Science 2024-03-18 Minzhi Li , Taiwei Shi , Caleb Ziems , Min-Yen Kan , Nancy F. Chen , Zhengyuan Liu , Diyi Yang

Data annotation and synthesis generally refers to the labeling or generating of raw data with relevant information, which could be used for improving the efficacy of machine learning models. The process, however, is labor-intensive and…

Computation and Language · Computer Science 2024-12-04 Zhen Tan , Dawei Li , Song Wang , Alimohammad Beigi , Bohan Jiang , Amrita Bhattacharjee , Mansooreh Karami , Jundong Li , Lu Cheng , Huan Liu

Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end…

Computation and Language · Computer Science 2016-02-19 Mark A. Finlayson , Tomaž Erjavec

In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research. As the availability of datasets increases, it becomes more important to have quality metadata for discovering…

Computation and Language · Computer Science 2023-10-18 Shiwei Zhang , Mingfang Wu , Xiuzhen Zhang

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists' work. Advances in natural language processing can help to accelerate this…

Computation and Language · Computer Science 2018-12-14 Graham Neubig , Patrick Littell , Chian-Yu Chen , Jean Lee , Zirui Li , Yu-Hsiang Lin , Yuyan Zhang

Annotated datasets are commonly used in the training and evaluation of tasks involving natural language and vision (image description generation, action recognition and visual question answering). However, many of the existing datasets…

Computer Vision and Pattern Recognition · Computer Science 2016-07-12 Gitit Kehat , James Pustejovsky

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Many complex discourse-level tasks can aid domain experts in their work but require costly expert annotations for data creation. To speed up and ease annotations, we investigate the viability of automatically generated annotation…

Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill…

Computation and Language · Computer Science 2020-06-30 Alex Sokolov , Denis Filimonov

In this paper, we explore Annotation Artifacts - the phenomena wherein large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task and instead rely on some dataset artifacts…

Computation and Language · Computer Science 2023-02-10 Armaan Singh Bhullar

When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by…

Computation and Language · Computer Science 2022-04-05 Joel Ruben Antony Moniz , Barun Patra , Matthew R. Gormley
‹ Prev 1 2 3 10 Next ›