English
Related papers

Related papers: Toxicity Classification in Ukrainian

200 papers

The rapid adoption of LLMs in both research and industry highlights the challenges of deploying them safely and reveals a gap in the systematic evaluation of toxicity benchmarks. As organizations increasingly rely on these benchmarks to…

Artificial Intelligence · Computer Science 2026-05-12 Regina Gugg , Selina Niederländer , Andreas Stöckl , Martin Flechl

Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research…

Computation and Language · Computer Science 2018-09-21 Betty van Aken , Julian Risch , Ralf Krestel , Alexander Löser

This paper provides an overview of a text mining tool the StyloMetrix developed initially for the Polish language and further extended for English and recently for Ukrainian. The StyloMetrix is built upon various metrics crafted manually by…

Computation and Language · Computer Science 2023-05-24 Daria Stetsenko , Inez Okulska

Online toxic content has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. A significant amount of research has been focused on detecting or analyzing toxic content using machine-learning…

Computation and Language · Computer Science 2025-09-19 Gautam Kishore Shahi , Tim A. Majchrzak

The algorithm of the creation texts parallel corpora was presented. The algorithm is based on the use of "key words" in text documents, and on the means of their automated translation. Key words were singled out by means of using Russian…

Computation and Language · Computer Science 2008-07-03 D. V. Lande , V. V. Zhygalo

Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to…

For multilingual factual knowledge assessment of LLMs, benchmarks such as MLAMA use template translations that do not take into account the grammatical and semantic information of the named entities inserted in the sentence. This leads to…

Computation and Language · Computer Science 2025-10-20 Kirill Semenov , Rico Sennrich

Large language models (LLMs) are increasingly exposed to data contamination, i.e., performance gains driven by prior exposure of test datasets rather than generalization. However, in the context of tabular data, this problem is largely…

Computation and Language · Computer Science 2026-03-31 Matteo Silvestri , Fabiano Veglianti , Flavio Giorgi , Fabrizio Silvestri , Gabriele Tolomei

The spread of toxic content online is an important problem that has adverse effects on user experience online and in our society at large. Motivated by the importance and impact of the problem, research focuses on developing solutions to…

Computation and Language · Computer Science 2023-08-11 Xinlei He , Savvas Zannettou , Yun Shen , Yang Zhang

Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for…

Computation and Language · Computer Science 2023-11-27 Daryna Dementieva , Daniil Moskovskiy , David Dale , Alexander Panchenko

Recent generative large language models (LLMs) show remarkable performance in non-English languages, but when prompted in those languages they tend to express higher harmful social biases and toxicity levels. Prior work has shown that…

Computation and Language · Computer Science 2025-06-03 Vera Neplenbroek , Arianna Bisazza , Raquel Fernández

This paper presents one of the top-performing solutions to the UNLP 2025 Shared Task on Detecting Manipulation in Social Media. The task focuses on detecting and classifying rhetorical and stylistic manipulation techniques used to influence…

Computation and Language · Computer Science 2025-06-02 Kateryna Akhynko , Oleksandr Kosovan , Mykola Trokhymovych

The rapid growth in user generated content on social media has resulted in a significant rise in demand for automated content moderation. Various methods and frameworks have been proposed for the tasks of hate speech detection and toxic…

Computation and Language · Computer Science 2024-09-27 Elizaveta Korotkova , Isaac Chung

Folktales are linguistically very rich and culturally significant in understanding the source language. Historically, only human translation has been used for translating folklore. Therefore, the number of translated texts is very sparse,…

Computation and Language · Computer Science 2024-10-15 Olena Burda-Lassen

Large language models (LLMs) are known to exhibit biases in downstream tasks, especially when dealing with sensitive topics such as political discourse, gender identity, ethnic relations, or national stereotypes. Although significant…

Computation and Language · Computer Science 2025-08-18 Martin Pavlíček , Tomáš Filip , Petr Sosík

Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers. However, the…

Computation and Language · Computer Science 2023-10-04 Vladislav Mikhailov , Tatiana Shamardina , Max Ryabinin , Alena Pestova , Ivan Smurov , Ekaterina Artemova

This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed of the text documents provided by the Czech News Agency and is freely available…

Computation and Language · Computer Science 2018-02-01 Pavel Král , Ladislav Lenc

The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text…

Computation and Language · Computer Science 2024-10-31 Feng Yao , Yufan Zhuang , Zihao Sun , Sunan Xu , Animesh Kumar , Jingbo Shang

With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models…

Computation and Language · Computer Science 2021-01-11 Carlos Badenes-Olmedo , Jose-Luis Redondo García , Oscar Corcho

There has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. Furthermore, although using LLMs as evaluators ("LLM-as-a-judge") is a growing research area, their sensitivity to dialectal…

Computation and Language · Computer Science 2024-11-19 Fahim Faisal , Md Mushfiqur Rahman , Antonios Anastasopoulos