Related papers: Code Switching Language Model Using Monolingual Tr…

Training a code-switching language model with monolingual data

A lack of code-switching data complicates the training of code-switching (CS) language models. We propose an approach to train such CS language models on monolingual data only. By constraining and normalizing the output projection matrix in…

Computation and Language · Computer Science 2020-05-22 Shun-Po Chuang , Tzu-Wei Sung , Hung-Yi Lee

Code-switched Language Models Using Dual RNNs and Same-Source Pretraining

This work focuses on building language models (LMs) for code-switched text. We propose two techniques that significantly improve these LMs: 1) A novel recurrent neural network unit with dual components that focus on each language in the…

Computation and Language · Computer Science 2018-09-07 Saurabh Garg , Tanmay Parekh , Preethi Jyothi

Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training

We focus on the problem of language modeling for code-switched language, in the context of automatic speech recognition (ASR). Language modeling for code-switched language is challenging for (at least) three reasons: (1) lack of available…

Computation and Language · Computer Science 2019-11-12 Hila Gonen , Yoav Goldberg

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this…

Computation and Language · Computer Science 2019-09-19 Genta Indra Winata , Andrea Madotto , Chien-Sheng Wu , Pascale Fung

The Effect of Alignment Objectives on Code-Switching Translation

One of the things that need to change when it comes to machine translation is the models' ability to translate code-switching content, especially with the rise of social media and user-generated content. In this paper, we are proposing a…

Computation and Language · Computer Science 2023-09-12 Mohamed Anwar

Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter

Code-switching (CS) phenomenon occurs when words or phrases from different languages are alternated in a single sentence. Due to data scarcity, building an effective CS Automatic Speech Recognition (ASR) system remains challenging. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-23 Yu Xi , Wen Ding , Kai Yu , Junjie Lai

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in…

Computation and Language · Computer Science 2021-11-03 Parul Chopra , Sai Krishna Rallabandi , Alan W Black , Khyathi Raghavi Chandu

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses…

Computation and Language · Computer Science 2021-10-08 Liang-Hsuan Tseng , Yu-Kuan Fu , Heng-Jui Chang , Hung-yi Lee

Conditioning LLMs to Generate Code-Switched Text

Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP), due to the limited availability of large-scale, diverse CS datasets for robust training and evaluation. Despite recent advances, the capabilities and…

Computation and Language · Computer Science 2026-03-09 Maite Heredia , Gorka Labaka , Jeremy Barnes , Aitor Soroa

Enhancing Multilingual Language Models for Code-Switched Input Data

Code-switching, or alternating between languages within a single conversation, presents challenges for multilingual language models on NLP tasks. This research investigates if pre-training Multilingual BERT (mBERT) on code-switched datasets…

Computation and Language · Computer Science 2025-03-12 Katherine Xie , Nitya Babbar , Vicky Chen , Yoanna Turura

From English to Code-Switching: Transfer Learning with Strong Morphological Clues

Linguistic Code-switching (CS) is still an understudied phenomenon in natural language processing. The NLP community has mostly focused on monolingual and multi-lingual scenarios, but little attention has been given to CS in particular.…

Computation and Language · Computer Science 2020-05-05 Gustavo Aguilar , Thamar Solorio

Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition

Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech…

Computation and Language · Computer Science 2023-07-04 Enes Yavuz Ugan , Christian Huber , Juan Hussain , Alexander Waibel

Cross-lingual Data Transformation and Combination for Text Classification

Text classification is a fundamental task for text data mining. In order to train a generalizable model, a large volume of text must be collected. To address data insufficiency, cross-lingual data may occasionally be necessary.…

Information Retrieval · Computer Science 2019-06-25 Jun Jiang , Shumao Pang , Xia Zhao , Liwei Wang , Andrew Wen , Hongfang Liu , Qianjin Feng

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical…

Computation and Language · Computer Science 2023-01-12 Amir Hussein , Shammur Absar Chowdhury , Ahmed Abdelali , Najim Dehak , Ahmed Ali , Sanjeev Khudanpur

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for…

Computation and Language · Computer Science 2019-09-25 Injy Hamed , Moritz Zhu , Mohamed Elmahdy , Slim Abdennadher , Ngoc Thang Vu

Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-02 Sanket Shah , Basil Abraham , Gurunath Reddy M , Sunayana Sitaram , Vikas Joshi

Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition

In this paper, we conduct data selection analysis in building an English-Mandarin code-switching (CS) speech recognition (CSSR) system, which is aimed for a real CSSR contest in China. The overall training sets have three subsets, i.e., a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-15 Haobo Zhang , Haihua Xu , Van Tung Pham , Hao Huang , Eng Siong Chng

Joint Training for Neural Machine Translation Models with Monolingual Data

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation…

Computation and Language · Computer Science 2018-03-02 Zhirui Zhang , Shujie Liu , Mu Li , Ming Zhou , Enhong Chen

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching…

Computation and Language · Computer Science 2021-01-14 Yerbolat Khassanov , Haihua Xu , Van Tung Pham , Zhiping Zeng , Eng Siong Chng , Chongjia Ni , Bin Ma

Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling

Building large-scale datasets for training code-switching language models is challenging and very expensive. To alleviate this problem using parallel corpus has been a major workaround. However, existing solutions use linguistic constraints…

Computation and Language · Computer Science 2018-10-31 Genta Indra Winata , Andrea Madotto , Chien-Sheng Wu , Pascale Fung