English
Related papers

Related papers: Toxicity Classification in Ukrainian

200 papers

Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a risk to their practical application. In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of…

Computation and Language · Computer Science 2022-03-08 Canwen Xu , Zexue He , Zhankui He , Julian McAuley

Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text…

Computation and Language · Computer Science 2021-02-02 Xuhui Zhou , Maarten Sap , Swabha Swayamdipta , Noah A. Smith , Yejin Choi

Large language models (LLMs) have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for…

Computation and Language · Computer Science 2023-12-25 Ningyu Xu , Qi Zhang , Jingting Ye , Menghan Zhang , Xuanjing Huang

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual…

Computation and Language · Computer Science 2016-07-01 Long Duong , Hiroshi Kanayama , Tengfei Ma , Steven Bird , Trevor Cohn

Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain…

Computation and Language · Computer Science 2023-06-19 Stefan F. Schouten , Baran Barbarestani , Wondimagegnhue Tufa , Piek Vossen , Ilia Markov

Legal NLP benchmarks overwhelmingly evaluate a single language or aggregate tasks that differ fundamentally across jurisdictions, making cross-lingual comparison impossible. We introduce Multi-Legal-Bench, the first cross-jurisdictional…

Computation and Language · Computer Science 2026-05-29 Volodymyr Ovcharov

Providing better language tools for low-resource and endangered languages is imperative for equitable growth. Recent progress with massively multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer…

Computation and Language · Computer Science 2022-11-10 Louis Clouâtre , Prasanna Parthasarathi , Amal Zouaq , Sarath Chandar

Large Language Models (LLMs) have recently exploded in popularity, often matching or outperforming human abilities on many tasks. One of the key factors in training LLMs is the availability and curation of high-quality data. Data quality is…

Computation and Language · Computer Science 2025-11-04 Vlad Negoita , Mihai Masala , Traian Rebedea

Understanding toxicity in user conversations is undoubtedly an important problem. Addressing "covert" or implicit cases of toxicity is particularly hard and requires context. Very few previous studies have analysed the influence of…

Computation and Language · Computer Science 2022-10-19 Atijit Anuchitanukul , Julia Ive , Lucia Specia

In many multilingual text classification problems, the documents in different languages often share the same set of categories. To reduce the labeling cost of training a classification model for each individual language, it is important to…

Computation and Language · Computer Science 2012-07-03 Yuhong Guo , Min Xiao

In the era of increasingly sophisticated natural language processing (NLP) systems, large language models (LLMs) have demonstrated remarkable potential for diverse applications, including tasks requiring nuanced textual understanding and…

Computation and Language · Computer Science 2025-05-16 Poli Apollinaire Nemkova , Solomon Ubani , Mark V. Albert

While in real life everyone behaves themselves at least to some extent, it is much more difficult to expect people to behave themselves on the internet, because there are few checks or consequences for posting something toxic to others.…

Computation and Language · Computer Science 2021-12-14 Kehan Wang , Jiaxi Yang , Hongjun Wu

Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the…

Computation and Language · Computer Science 2021-04-13 Alireza Salemi , Nazanin Sabri , Emad Kebriaei , Behnam Bahrak , Azadeh Shakery

The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic…

Artificial Intelligence · Computer Science 2023-11-02 Senjuti Dutta , Sid Mittal , Sherol Chen , Deepak Ramachandran , Ravi Rajakumar , Ian Kivlichan , Sunny Mak , Alena Butryna , Praveen Paritosh

Cross-lingual transfer of word embeddings aims to establish the semantic mappings among words in different languages by learning the transformation functions over the corresponding word embedding spaces. Successfully solving this problem…

Computation and Language · Computer Science 2018-09-12 Ruochen Xu , Yiming Yang , Naoki Otani , Yuexin Wu

Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates…

Machine Learning · Computer Science 2026-02-18 Rahul Garg , Trilok Padhi , Hemang Jain , Ugur Kursuncu , Ponnurangam Kumaraguru

Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language and its associated annotated corpora. However,…

Computation and Language · Computer Science 2017-05-02 Meng Fang , Trevor Cohn

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich…

Computation and Language · Computer Science 2020-05-08 Hao Fei , Meishan Zhang , Donghong Ji

Text classification is a very classic NLP task, but it has two prominent shortcomings: On the one hand, text classification is deeply domain-dependent. That is, a classifier trained on the corpus of one domain may not perform so well in…

Computation and Language · Computer Science 2022-10-28 Zilin Yuan , Yinghui Li , Yangning Li , Rui Xie , Wei Wu , Hai-Tao Zheng

To achieve equitable performance across languages, large language models (LLMs) must be able to abstract knowledge beyond the language in which it was learnt. However, the current literature lacks reliable ways to measure LLMs' capability…

‹ Prev 1 3 4 5 6 7 10 Next ›