Related papers: Learning to Recognize Dialect Features

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

Dialects introduce syntactic and lexical variations in language that occur in regional or social groups. Most NLP methods are not sensitive to such variations. This may lead to unfair behavior of the methods, conveying negative bias towards…

Computation and Language · Computer Science 2024-06-17 Maximilian Spliethöver , Sai Nikhil Menon , Henning Wachsmuth

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis. This is largely due to the complexity and nuance involved in studying various dialects. We present a novel…

Computation and Language · Computer Science 2024-03-26 Roy Xie , Orevaoghene Ahia , Yulia Tsvetkov , Antonios Anastasopoulos

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum

There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English often works with Indian English or…

Computation and Language · Computer Science 2025-11-19 Ryan Soh-Eun Shim , Barbara Plank

Feature Selection on Noisy Twitter Short Text Messages for Language Identification

The task of written language identification involves typically the detection of the languages present in a sample of text. Moreover, a sequence of text may not belong to a single inherent language but also may be mixture of text written in…

Computation and Language · Computer Science 2020-07-14 Mohd Zeeshan Ansari , Tanvir Ahmad , Ana Fatima

Natural Language Processing for Dialects of a Language: A Survey

State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of…

Computation and Language · Computer Science 2024-12-10 Aditya Joshi , Raj Dabre , Diptesh Kanojia , Zhuang Li , Haolan Zhan , Gholamreza Haffari , Doris Dippold

ILID: Native Script Language Identification for Indian Languages

The language identification task is a crucial fundamental step in NLP. Often it serves as a pre-processing step for widely used NLP applications such as multilingual machine translation, information retrieval, question and answering, and…

Computation and Language · Computer Science 2026-01-08 Yash Ingle , Pruthwik Mishra

Speaker Recognition in Bengali Language from Nonlinear Features

At present Automatic Speaker Recognition system is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking style of a person, vocal tract…

Sound · Computer Science 2020-04-20 Uddalok Sarkar , Soumyadeep Pal , Sayan Nag , Chirayata Bhattacharya , Shankha Sanyal , Archi Banerjee , Ranjan Sengupta , Dipak Ghosh

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

Discriminating between closely-related language varieties is considered a challenging and important task. This paper describes our submission to the DSL 2016 shared-task, which included two sub-tasks: one on discriminating similar languages…

Computation and Language · Computer Science 2016-09-27 Yonatan Belinkov , James Glass

INDIC DIALECT: A Multi Task Benchmark to Evaluate and Translate in Indian Language Dialects

Recent NLP advances focus primarily on standardized languages, leaving most low-resource dialects under-served especially in Indian scenarios. In India, the issue is particularly important: despite Hindi being the third most spoken language…

Computation and Language · Computer Science 2026-01-16 Tarun Sharma , Manikandan Ravikiran , Sourava Kumar Behera , Pramit Bhattacharya , Arnab Bhattacharya , Rohit Saluja

Spoken Language Identification Using Hybrid Feature Extraction Methods

This paper introduces and motivates the use of hybrid robust feature extraction technique for spoken language identification (LID) system. The speech recognizers use a parametric form of a signal to get the most important distinguishable…

Sound · Computer Science 2010-03-31 Pawan Kumar , Astik Biswas , A . N. Mishra , Mahesh Chandra

An Improved Feature Descriptor for Recognition of Handwritten Bangla Alphabet

Appropriate feature set for representation of pattern classes is one of the most important aspects of handwritten character recognition. The effectiveness of features depends on the discriminating power of the features chosen to represent…

Computer Vision and Pattern Recognition · Computer Science 2015-01-23 Nibaran Das , Subhadip Basu , Ram Sarkar , Mahantapas Kundu , Mita Nasipuri , Dipak kumar Basu

A Dual-Decoder Conformer for Multilingual Speech Recognition

Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech…

Computation and Language · Computer Science 2021-09-09 Krishna D N

DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions

Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However,…

Computation and Language · Computer Science 2019-10-17 Hang Jiang , Haoshen Hong , Yuxing Chen , Vivek Kulkarni

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages

Cognates are variants of the same lexical form across different languages; for example 'fonema' in Spanish and 'phoneme' in English are cognates, both of which mean 'a unit of sound'. The task of automatic detection of cognates among any…

Computation and Language · Computer Science 2021-12-17 Diptesh Kanojia , Raj Dabre , Shubham Dewangan , Pushpak Bhattacharyya , Gholamreza Haffari , Malhar Kulkarni

Modeling Global Syntactic Variation in English Using Dialect Classification

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method…

Computation and Language · Computer Science 2019-04-12 Jonathan Dunn

Learning Language Representations for Typology Prediction

One central mystery of neural NLP is what neural models "know" about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages?…

Computation and Language · Computer Science 2017-08-01 Chaitanya Malaviya , Graham Neubig , Patrick Littell

A Simple Joint Model for Improved Contextual Neural Lemmatization

English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We…

Computation and Language · Computer Science 2024-05-29 Chaitanya Malaviya , Shijie Wu , Ryan Cotterell

Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling

Morphological tagging is challenging for morphologically rich languages due to the large target space and the need for more training data to minimize model sparsity. Dialectal variants of morphologically rich languages suffer more as they…

Computation and Language · Computer Science 2019-10-29 Nasser Zalmout , Nizar Habash

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single…

Audio and Speech Processing · Electrical Eng. & Systems 2017-12-06 Bo Li , Tara N. Sainath , Khe Chai Sim , Michiel Bacchiani , Eugene Weinstein , Patrick Nguyen , Zhifeng Chen , Yonghui Wu , Kanishka Rao

Deep Discriminative Feature Learning for Accent Recognition

Accent recognition with deep learning framework is a similar work to deep speaker identification, they're both expected to give the input speech an identifiable representation. Compared with the individual-level features learned by speaker…

Sound · Computer Science 2021-08-26 Wei Wang , Chao Zhang , Xiaopei Wu