English
Related papers

Related papers: Toxicity Classification in Ukrainian

200 papers

The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language…

Computation and Language · Computer Science 2019-09-23 Guokun Lai , Barlas Oguz , Yiming Yang , Veselin Stoyanov

In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is…

Computation and Language · Computer Science 2026-05-19 Franciszek Górski , Andrzej Czyżewski

Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were…

Computation and Language · Computer Science 2023-09-11 Daryna Dementieva , Nikolay Babakov , Alexander Panchenko

The text generated on social media platforms is essentially a mixed lingual text. The mixing of language in any form produces considerable amount of difficulty in language processing systems. Moreover, the advancements in language…

Information Retrieval · Computer Science 2018-10-09 Mohd Zeeshan Ansari , Tanvir Ahmad , Md Arshad Ali

Interpretability is a topic that has been in the spotlight for the past few years. Most existing interpretability techniques produce interpretations in the form of rules or feature importance. These interpretations, while informative, may…

Computation and Language · Computer Science 2024-10-15 Nikolaos Mylonas , Nikolaos Stylianou , Theodora Tsikrika , Stefanos Vrochidis , Ioannis Kompatsiaris

Currently, there are more than a dozen Russian-language corpora for sentiment analysis, differing in the source of the texts, domain, size, number and ratio of sentiment classes, and annotation method. This work examines publicly available…

Computation and Language · Computer Science 2021-06-29 Evgeny Kotelnikov

Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon…

Computation and Language · Computer Science 2021-12-21 Wenda Xu , Michael Saxon , Misha Sra , William Yang Wang

Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-English texts can be very time-consuming and expensive due to the lack of annotated data. Cross-lingual transfer (CLT) is a way to circumvent this…

Computation and Language · Computer Science 2023-06-08 Xavier Fontaine , Félix Gaschi , Parisa Rastin , Yannick Toussaint

This study reviewed the use of Large Language Models (LLMs) in healthcare, focusing on their training corpora, customization techniques, and evaluation metrics. A systematic search of studies from 2021 to 2024 identified 61 articles. Four…

Computation and Language · Computer Science 2025-02-18 Shuqi Yang , Mingrui Jing , Shuai Wang , Jiaxin Kou , Manfei Shi , Weijie Xing , Yan Hu , Zheng Zhu

Discourse understanding is essential for many NLP tasks, yet most existing work remains constrained by framework-dependent discourse representations. This work investigates whether large language models (LLMs) capture discourse knowledge…

Computation and Language · Computer Science 2025-06-05 Florian Eichin , Yang Janet Liu , Barbara Plank , Michael A. Hedderich

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we…

Computation and Language · Computer Science 2025-09-24 Yujia Hu , Ming Shan Hee , Preslav Nakov , Roy Ka-Wei Lee

Content moderation typically combines the efforts of human moderators and machine learning models. However, these systems often rely on data where significant disagreement occurs during moderation, reflecting the subjective nature of…

Computation and Language · Computer Science 2025-09-01 Guillermo Villate-Castillo , Javier Del Ser , Borja Sanz

In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned…

Computation and Language · Computer Science 2025-07-28 Nicole Lai-Lopez , Lusha Wang , Su Yuan , Liza Zhang

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is…

Information Retrieval · Computer Science 2014-06-11 Robert A. Bridges , Corinne L. Jones , Michael D. Iannacone , Kelly M. Testa , John R. Goodall

The use of machine learning (ML)-based language models (LMs) to monitor content online is on the rise. For toxic text identification, task-specific fine-tuning of these models are performed using datasets labeled by annotators who provide…

Computation and Language · Computer Science 2021-12-08 Kofi Arhin , Ioana Baldini , Dennis Wei , Karthikeyan Natesan Ramamurthy , Moninder Singh

Language interference is common in today's multilingual societies where more languages are being in contact and as a global final result leads to the creation of hybrid languages. These, together with doubts on their right to be officially…

Computation and Language · Computer Science 2019-12-19 Nataliya Sira , Giorgio Maria Di Nunzio , Viviana Nosilia

Large pre-trained language models are often trained on large volumes of internet data, some of which may contain toxic or abusive language. Consequently, language models encode toxic information, which makes the real-world usage of these…

Computation and Language · Computer Science 2021-12-16 Andrew Wang , Mohit Sudhakar , Yangfeng Ji

With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to reduce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can…

Computation and Language · Computer Science 2023-02-28 Meng Cao , Mehdi Fatemi , Jackie Chi Kit Cheung , Samira Shabanian

Large language models (LLMs) are increasingly deployed as analytical tools across multilingual contexts, yet their outputs may carry systematic biases conditioned by the language of the prompt. This study presents an experimental comparison…

Computers and Society · Computer Science 2026-02-03 Oleg Smirnov

Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first…

Computation and Language · Computer Science 2018-06-13 Shudong Hao , Michael J. Paul