Related papers: Towards Generating Automatic Anaphora Annotations

Revisiting the Exit from Nuclear Energy in Germany with NLP

Annotation of political discourse is resource-intensive, but recent developments in NLP promise to automate complex annotation tasks. Fine-tuned transformer-based models outperform human annotators in some annotation tasks, but they require…

Computation and Language · Computer Science 2024-08-27 Sebastian Haunss , André Blessing

GPTs Are Multilingual Annotators for Sequence Generation Tasks

Data annotation is an essential step for constructing new datasets. However, the conventional approach of data annotation through crowdsourcing is both time-consuming and expensive. In addition, the complexity of this process increases when…

Computation and Language · Computer Science 2024-02-09 Juhwan Choi , Eunju Lee , Kyohoon Jin , YoungBin Kim

Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation

Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the…

Computation and Language · Computer Science 2021-09-09 Leonardo F. R. Ribeiro , Jonas Pfeiffer , Yue Zhang , Iryna Gurevych

Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language…

Computation and Language · Computer Science 2023-12-14 Kamil Kanclerz , Julita Bielaniewicz , Marcin Gruza , Jan Kocon , Stanisław Woźniak , Przemysław Kazienko

Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Obtaining text datasets with semantic annotations is an effortful process, yet crucial for supervised training in natural language processsing (NLP). In general, developing and applying new NLP pipelines in domain-specific contexts for…

Computation and Language · Computer Science 2022-09-01 Johann Frei , Frank Kramer

Unveiling the Multi-Annotation Process: Examining the Influence of Annotation Quantity and Instance Difficulty on Model Performance

The NLP community has long advocated for the construction of multi-annotator datasets to better capture the nuances of language interpretation, subjectivity, and ambiguity. This paper conducts a retrospective study to show how performance…

Computation and Language · Computer Science 2023-10-24 Pritam Kadasi , Mayank Singh

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

While state-of-the-art NLP models have been achieving the excellent performance of a wide range of tasks in recent years, important questions are being raised about their robustness and their underlying sensitivity to systematic biases that…

Computation and Language · Computer Science 2022-03-25 Linyi Yang , Jiazheng Li , Pádraig Cunningham , Yue Zhang , Barry Smyth , Ruihai Dong

LLMaAA: Making Large Language Models as Active Annotators

Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior…

Computation and Language · Computer Science 2023-11-01 Ruoyu Zhang , Yanzeng Li , Yongliang Ma , Ming Zhou , Lei Zou

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver…

Computation and Language · Computer Science 2024-03-22 Mitja Nikolaus , Abhishek Agrawal , Petros Kaklamanis , Alex Warstadt , Abdellah Fourtassi

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot…

Computation and Language · Computer Science 2024-03-18 Minzhi Li , Taiwei Shi , Caleb Ziems , Min-Yen Kan , Nancy F. Chen , Zhengyuan Liu , Diyi Yang

Large Language Models for Data Annotation and Synthesis: A Survey

Data annotation and synthesis generally refers to the labeling or generating of raw data with relevant information, which could be used for improving the efficacy of machine learning models. The process, however, is labor-intensive and…

Computation and Language · Computer Science 2024-12-04 Zhen Tan , Dawei Li , Song Wang , Alimohammad Beigi , Bohan Jiang , Amrita Bhattacharjee , Mansooreh Karami , Jundong Li , Lu Cheng , Huan Liu

Overview of Annotation Creation: Processes & Tools

Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end…

Computation and Language · Computer Science 2016-02-19 Mark A. Finlayson , Tomaž Erjavec

Utilising a Large Language Model to Annotate Subject Metadata: A Case Study in an Australian National Research Data Catalogue

In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research. As the availability of datasets increases, it becomes more important to have quality metadata for discovering…

Computation and Language · Computer Science 2023-10-18 Shiwei Zhang , Mingfang Wu , Xiuzhen Zhang

Towards a General-Purpose Linguistic Annotation Backend

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists' work. Advances in natural language processing can help to accelerate this…

Computation and Language · Computer Science 2018-12-14 Graham Neubig , Patrick Littell , Chian-Yu Chen , Jean Lee , Zirui Li , Yu-Hsiang Lin , Yuyan Zhang

Annotation Methodologies for Vision and Language Dataset Creation

Annotated datasets are commonly used in the training and evaluation of tasks involving natural language and vision (image description generation, action recognition and visual question answering). However, many of the existing datasets…

Computer Vision and Pattern Recognition · Computer Science 2016-07-12 Gitit Kehat , James Pustejovsky

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Analysis of Automatic Annotation Suggestions for Hard Discourse-Level Tasks in Expert Domains

Many complex discourse-level tasks can aid domain experts in their work but require costly expert annotations for data creation. To speed up and ease annotations, we investigate the viability of automatically generated annotation…

Computation and Language · Computer Science 2019-06-07 Claudia Schulz , Christian M. Meyer , Jan Kiesewetter , Michael Sailer , Elisabeth Bauer , Martin R. Fischer , Frank Fischer , Iryna Gurevych

Neural Machine Translation For Paraphrase Generation

Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill…

Computation and Language · Computer Science 2020-06-30 Alex Sokolov , Denis Filimonov

Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

In this paper, we explore Annotation Artifacts - the phenomena wherein large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task and instead rely on some dataset artifacts…

Computation and Language · Computer Science 2023-02-10 Armaan Singh Bhullar

On Efficiently Acquiring Annotations for Multilingual Models

When tasked with supporting multiple languages for a given problem, two approaches have arisen: training a model for each language with the annotation budget divided equally among them, and training on a high-resource language followed by…

Computation and Language · Computer Science 2022-04-05 Joel Ruben Antony Moniz , Barun Patra , Matthew R. Gormley