Related papers: Evaluating Transformer-Based Multilingual Text Cla…

An Empirical Study of Factors Affecting Language-Independent Models

Scaling existing applications and solutions to multiple human languages has traditionally proven to be difficult, mainly due to the language-dependent nature of preprocessing and feature engineering techniques employed in traditional…

Computation and Language · Computer Science 2020-01-01 Xiaotong Liu , Yingbei Tong , Anbang Xu , Rama Akkiraju

Survey on the Use of Typological Information in Natural Language Processing

In recent years linguistic typology, which classifies the world's languages according to their functional and structural properties, has been widely used to support multilingual NLP. While the growing importance of typological information…

Computation and Language · Computer Science 2016-10-12 Helen O'Horan , Yevgeni Berzak , Ivan Vulić , Roi Reichart , Anna Korhonen

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the…

Computation and Language · Computer Science 2019-12-12 Gözde Gül Şahin , Clara Vania , Ilia Kuznetsov , Iryna Gurevych

A Principled Framework for Evaluating on Typologically Diverse Languages

Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is…

Computation and Language · Computer Science 2025-09-09 Esther Ploeger , Wessel Poelman , Andreas Holck Høeg-Petersen , Anders Schlichtkrull , Miryam de Lhoneux , Johannes Bjerva

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that…

Computation and Language · Computer Science 2020-10-28 Edoardo Maria Ponti , Helen O'Horan , Yevgeni Berzak , Ivan Vulić , Roi Reichart , Thierry Poibeau , Ekaterina Shutova , Anna Korhonen

Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

Text classification has long been a staple within Natural Language Processing (NLP) with applications spanning across diverse areas such as sentiment analysis, recommender systems and spam detection. With such a powerful solution, it is…

Computation and Language · Computer Science 2021-12-06 Amir Atapour-Abarghouei , Stephen Bonner , Andrew Stephen McGough

The Text Classification Pipeline: Starting Shallow going Deeper

Text classification stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through computer science and engineering. The past decade has seen deep learning revolutionize text classification,…

Computation and Language · Computer Science 2025-04-23 Marco Siino , Ilenia Tinnirello , Marco La Cascia

Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

The performance of multilingual pretrained models is highly dependent on the availability of monolingual or parallel text present in a target language. Thus, the majority of the world's languages cannot benefit from recent progress in NLP…

Computation and Language · Computer Science 2022-04-07 Xinyi Wang , Sebastian Ruder , Graham Neubig

Systematic Inequalities in Language Technology Performance across the World's Languages

Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development. While the performance of NLP methods has grown…

Computation and Language · Computer Science 2021-10-14 Damián Blasi , Antonios Anastasopoulos , Graham Neubig

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks such as text classification, question-answering, and token classification. However, this performance is usually…

Computation and Language · Computer Science 2020-11-05 Kushal Jain , Adwait Deshpande , Kumar Shridhar , Felix Laumann , Ayushman Dash

Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models

Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also been proposed to determine the intrinsic…

Computation and Language · Computer Science 2026-02-04 Vitalii Hirak , Jaap Jumelet , Arianna Bisazza

Are All Languages Equally Hard to Language-Model?

For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair…

Computation and Language · Computer Science 2020-02-26 Ryan Cotterell , Sabrina J. Mielke , Jason Eisner , Brian Roark

Multilingual Text Representation

Modern NLP breakthrough includes large multilingual models capable of performing tasks across more than 100 languages. State-of-the-art language models came a long way, starting from the simple one-hot representation of words capable of…

Computation and Language · Computer Science 2023-09-06 Fahim Faisal

Comparative study on Judgment Text Classification for Transformer Based Models

This work involves the usage of various NLP models to predict the winner of a particular judgment by the means of text extraction and summarization from a judgment document. These documents are useful when it comes to legal proceedings. One…

Computation and Language · Computer Science 2023-06-06 Stanley Kingston , Prassanth , Shrinivas A , Balamurugan MS , Manoj Kumar Rajagopal

A Morphology-Based Investigation of Positional Encodings

Contemporary deep learning models effectively handle languages with diverse morphology despite not being directly integrated into them. Morphology and word order are closely linked, with the latter incorporated into transformer-based models…

Computation and Language · Computer Science 2024-05-31 Poulami Ghosh , Shikhar Vashishth , Raj Dabre , Pushpak Bhattacharyya

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script,…

Computation and Language · Computer Science 2026-03-25 Haeji Jung , Jinju Kim , Kyungjin Kim , Youjeong Roh , David R. Mortensen

What is "Typological Diversity" in NLP?

The NLP research community has devoted increased attention to languages beyond English, resulting in considerable improvements for multilingual NLP. However, these improvements only apply to a small subset of the world's languages. Aiming…

Computation and Language · Computer Science 2024-10-03 Esther Ploeger , Wessel Poelman , Miryam de Lhoneux , Johannes Bjerva

Multilingual Gradient Word-Order Typology from Universal Dependencies

While information from the field of linguistic typology has the potential to improve performance on NLP tasks, reliable typological data is a prerequisite. Existing typological databases, including WALS and Grambank, suffer from…

Computation and Language · Computer Science 2024-02-05 Emi Baylor , Esther Ploeger , Johannes Bjerva

Toward Culturally Grounded Natural Language Processing

Multilingual NLP is often treated as a route to global inclusion, but linguistic coverage and cultural competence frequently diverge. This paper synthesizes over 50 papers spanning multilingual performance inequality, cross-lingual…

Computation and Language · Computer Science 2026-05-05 Sina Bagheri Nezhad

Bangla Text Classification using Transformers

Text classification has been one of the earliest problems in NLP. Over time the scope of application areas has broadened and the difficulty of dealing with new areas (e.g., noisy social media content) has increased. The problem-solving…

Computation and Language · Computer Science 2020-11-10 Tanvirul Alam , Akib Khan , Firoj Alam