Related papers: RuSentEval: Linguistic Source, Encoder Force!

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for…

Computation and Language · Computer Science 2023-10-04 Tatiana Shavrina , Alena Fenogenova , Anton Emelyanov , Denis Shevelev , Ekaterina Artemova , Valentin Malykh , Vladislav Mikhailov , Maria Tikhonova , Andrey Chertok , Andrey Evlampiev

IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?

Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties…

Computation and Language · Computer Science 2025-11-04 Akhilesh Aravapalli , Mounika Marreddy , Radhika Mamidi , Manish Gupta , Subba Reddy Oota

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension,…

Computation and Language · Computer Science 2019-05-20 Yuri Kuratov , Mikhail Arkhipov

A Family of Pretrained Transformer Language Models for Russian

Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, developing such models specifically for the Russian language has received little attention. This paper…

Computation and Language · Computer Science 2024-08-05 Dmitry Zmitrovich , Alexander Abramov , Andrey Kalmykov , Maria Tikhonova , Ekaterina Taktasheva , Danil Astafurov , Mark Baushenko , Artem Snegirev , Vitalii Kadulin , Sergey Markov , Tatiana Shavrina , Vladislav Mikhailov , Alena Fenogenova

Comparison of parameters of vowel sounds of russian and english languages

In multilingual speech recognition systems, a situation can often arise when the language is not known in advance, but the signal has already been received and is being processed. For such cases, some generalized model is needed that will…

Sound · Computer Science 2024-01-29 V. I. Fedoseev , A. A. Konev , A. Yu. Yakimuk

Linguistic Interpretability of Transformer-based Language Models: a systematic review

Language models based on the Transformer architecture achieve excellent results in many language-related tasks, such as text classification or sentiment analysis. However, despite the architecture of these models being well-defined, little…

Computation and Language · Computer Science 2025-04-14 Miguel López-Otal , Jorge Gracia , Jordi Bernad , Carlos Bobed , Lucía Pitarch-Ballesteros , Emma Anglés-Herrero

Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties

Large Language Models (LLMs) are predominantly evaluated on Standard American English (SAE), often overlooking the diversity of global English varieties. This narrow focus may raise fairness concerns as degraded performance on non-standard…

Computation and Language · Computer Science 2025-10-10 Jiyoung Lee , Seungho Kim , Jieun Han , Jun-Min Lee , Kitaek Kim , Alice Oh , Edward Choi

RUSSE: The First Workshop on Russian Semantic Similarity

The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such…

Computation and Language · Computer Science 2018-03-16 Alexander Panchenko , Natalia Loukachevitch , Dmitry Ustalov , Denis Paperno , Christian Meyer , Natalia Konstantinova

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English

People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting…

Computation and Language · Computer Science 2024-09-04 Aekansh Kathunia , Mohammad Kaif , Nalin Arora , N Narotam

Problems of Non-equivalent Words in Technical Translation

Translating words which do not have equivalent in target language is not easy and finding proper equivalent of those words are very important to render correctly and understandably, the article defines some thoughts and ideas of scientists…

Computation and Language · Computer Science 2023-11-22 Mohammad Ibrahim Qani

Probing Multilingual Sentence Representations With X-Probe

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five…

Computation and Language · Computer Science 2019-06-13 Vinit Ravishankar , Lilja Øvrelid , Erik Velldal

RuMedBench: A Russian Medical Language Understanding Benchmark

The paper describes the open Russian medical language understanding benchmark covering several task types (classification, question answering, natural language inference, named entity recognition) on a number of novel text sets. Given the…

Computation and Language · Computer Science 2022-07-14 Pavel Blinov , Arina Reshetnikova , Aleksandr Nesterov , Galina Zubkova , Vladimir Kokh

Transformers for Headline Selection for Russian News Clusters

In this paper, we explore various multilingual and Russian pre-trained transformer-based models for the Dialogue Evaluation 2021 shared task on headline selection. Our experiments show that the combined approach is superior to individual…

Computation and Language · Computer Science 2021-06-22 Pavel Voropaev , Olga Sopilnyak

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in…

Computation and Language · Computer Science 2023-03-14 Andreas Triantafyllopoulos , Johannes Wagner , Hagen Wierstorf , Maximilian Schmitt , Uwe Reichel , Florian Eyben , Felix Burkhardt , Björn W. Schuller

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document…

Computation and Language · Computer Science 2021-04-13 Hansi Hettiarachchi , Tharindu Ranasinghe

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have…

Computation and Language · Computer Science 2023-07-06 Dmitry Karpov , Mikhail Burtsev

The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design

Embedding models play a crucial role in Natural Language Processing (NLP) by creating text embeddings used in various tasks such as information retrieval and assessing semantic text similarity. This paper focuses on research related to…

Computation and Language · Computer Science 2025-02-04 Artem Snegirev , Maria Tikhonova , Anna Maksimova , Alena Fenogenova , Alexander Abramov

Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures

Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets,…

Computation and Language · Computer Science 2020-05-07 Zein Shaheen , Gerhard Wohlgenannt , Bassel Zaity , Dmitry Mouromtsev , Vadim Pak

Is neural language acquisition similar to natural? A chronological probing study

The probing methodology allows one to obtain a partial representation of linguistic phenomena stored in the inner layers of the neural network, using external classifiers and statistical analysis. Pre-trained transformer-based language…

Computation and Language · Computer Science 2022-07-04 Ekaterina Voloshina , Oleg Serikov , Tatiana Shavrina

Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

The outstanding performance of transformer-based language models on a great variety of NLP and NLU tasks has stimulated interest in exploring their inner workings. Recent research has focused primarily on higher-level and complex linguistic…

Computation and Language · Computer Science 2021-05-06 Vladislav Mikhailov , Oleg Serikov , Ekaterina Artemova