English
Related papers

Related papers: COMMENTATOR: A Code-mixed Multilingual Text Annota…

200 papers

We introduce COMI-LINGUA, the largest manually annotated Hindi-English code-mixed dataset, comprising 125K+ high-quality instances across five core NLP tasks: Matrix Language Identification, Token-level Language Identification,…

Computation and Language · Computer Science 2025-09-18 Rajvee Sheth , Himanshu Beniwal , Mayank Singh

Code comments provide important information for understanding the source code. They can help developers understand the overall purpose of a function or class, as well as identify bugs and technical debt. However, an overabundance of…

Computation and Language · Computer Science 2024-08-12 Nam Le Hai , Nghi D. Q. Bui

Producing the required amounts of training data for machine learning and NLP tasks often involves human annotators doing very repetitive and monotonous work. In this paper, we present and evaluate our novel annotation framework DALPHI,…

Information Retrieval · Computer Science 2018-08-20 Robert Greinacher , Franziska Horn

Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for…

Computation and Language · Computer Science 2024-09-23 Sujan Dutta , Deepak Pandita , Tharindu Cyril Weerasooriya , Marcos Zampieri , Christopher M. Homan , Ashiqur R. KhudaBukhsh

The NLP community has witnessed steep progress in a variety of tasks across the realms of monolingual and multilingual language processing recently. These successes, in conjunction with the proliferating mixed language interactions on…

Computation and Language · Computer Science 2021-06-14 Sai Muralidhar Jayanthi , Kavya Nerella , Khyathi Raghavi Chandu , Alan W Black

With the growing prevalence of large language models, it is increasingly common to annotate datasets for machine learning using pools of crowd raters. However, these raters often work in isolation as individual crowdworkers. In this work,…

Computers and Society · Computer Science 2024-08-05 Sonja Schmer-Galunder , Ruta Wheelock , Scott Friedman , Alyssa Chvasta , Zaria Jalan , Emily Saltz

The advent of large language models (LLMs) has ushered in a new era in automated code translation across programming languages. Since most code-specific LLMs are pretrained on well-commented code from large repositories like GitHub, it is…

Software Engineering · Computer Science 2026-01-26 Monika Gupta , Ajay Meena , Anamitra Roy Choudhury , Vijay Arya , Srikanta Bedathur

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this…

Computation and Language · Computer Science 2021-04-22 Roman Grundkiewicz , Marcin Junczys-Dowmunt , Christian Federmann , Tom Kocmi

The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored…

Computation and Language · Computer Science 2023-10-20 Mohsin Ali , Kandukuri Sai Teja , Neeharika Gupta , Parth Patwa , Anubhab Chatterjee , Vinija Jain , Aman Chadha , Amitava Das

In the context of text classification, the financial burden of annotation exercises for creating training data is a critical issue. Active learning techniques, particularly those rooted in uncertainty sampling, offer a cost-effective…

Computation and Language · Computer Science 2024-06-19 Hamidreza Rouzegar , Masoud Makrehchi

The use of multilingualism in the new generation is widespread in the form of code-mixed data on social media, and therefore a robust translation system is required for catering to the monolingual users, as well as for easier comprehension…

Computation and Language · Computer Science 2019-11-25 Sainik Kumar Mahata , Soumil Mandal , Dipankar Das , Sivaji Bandyopadhyay

Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent…

Computation and Language · Computer Science 2023-11-21 Arie Cattan , Tom Hope , Doug Downey , Roy Bar-Haim , Lilach Eden , Yoav Kantor , Ido Dagan

Multilingualism refers to the high degree of proficiency in two or more languages in the written and oral communication modes. It often results in language mixing, a.k.a. code-mixing, when a multilingual speaker switches between multiple…

Computation and Language · Computer Science 2021-06-16 Vivek Srivastava , Mayank Singh

Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two alternative model responses to the same input, a human or AI annotator selects the "better" response.…

Computation and Language · Computer Science 2025-07-24 Arduin Findeis , Floris Weers , Guoli Yin , Ke Ye , Ruoming Pang , Tom Gunter

Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative…

Computation and Language · Computer Science 2026-05-26 Lingyu Gao , Will Monroe , David Smith , Meghan Jemison , Jackie Lee

Data annotation remains a significant bottleneck in the Humanities and Social Sciences, particularly for complex semantic tasks such as metaphor identification. While Large Language Models (LLMs) show promise, a significant gap remains…

Computation and Language · Computer Science 2026-02-06 Bingru Li

Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 He Zhang , Xinyi Fu , John M. Carroll

The NLP community has long advocated for the construction of multi-annotator datasets to better capture the nuances of language interpretation, subjectivity, and ambiguity. This paper conducts a retrospective study to show how performance…

Computation and Language · Computer Science 2023-10-24 Pritam Kadasi , Mayank Singh

Data annotation refers to the labeling or tagging of textual data with relevant information. A large body of works have reported positive results on leveraging LLMs as an alternative to human annotators. However, existing studies focus on…

Computation and Language · Computer Science 2024-10-07 Yu-Min Tseng , Wei-Lin Chen , Chung-Chi Chen , Hsin-Hsi Chen

We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs. More specifically, we offer a wide range of models that convert monolingual English text into Hinglish (code-mixed…

Computation and Language · Computer Science 2021-05-20 Ganesh Jawahar , El Moatez Billah Nagoudi , Muhammad Abdul-Mageed , Laks V. S. Lakshmanan
‹ Prev 1 2 3 10 Next ›