Related papers: A Deep Generative Model for Code-Switched Text

An Overview on Controllable Text Generation via Variational Auto-Encoders

Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans and able to understand natural language. The employment of deep neural architectures has been largely…

Computation and Language · Computer Science 2022-11-16 Haoqin Tu , Yitong Li

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation

Code-switching is about dealing with alternative languages in speech or text. It is partially speaker-depend and domain-related, so completely explaining the phenomenon by linguistic rules is challenging. Compared to most monolingual tasks,…

Computation and Language · Computer Science 2019-06-20 Ching-Ting Chang , Shun-Po Chuang , Hung-Yi Lee

Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models

Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables. In this paper, we investigate several multi-level structures to learn a VAE model to generate…

Computation and Language · Computer Science 2019-06-21 Dinghan Shen , Asli Celikyilmaz , Yizhe Zhang , Liqun Chen , Xin Wang , Jianfeng Gao , Lawrence Carin

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a…

Computation and Language · Computer Science 2021-12-14 Chia-Yu Li , Ngoc Thang Vu

From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text

Generating code-switched text is a problem of growing interest, especially given the scarcity of corpora containing large volumes of real code-switched text. In this work, we adapt a state-of-the-art neural machine translation model to…

Computation and Language · Computer Science 2021-07-15 Ishan Tarunesh , Syamantak Kumar , Preethi Jyothi

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this…

Computation and Language · Computer Science 2019-09-19 Genta Indra Winata , Andrea Madotto , Chien-Sheng Wu , Pascale Fung

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages. Despite its widespread use online and recent research trends in this area, research in code-switching presents…

Computation and Language · Computer Science 2024-05-08 Frances A. Laureano De Leon , Harish Tayyar Madabushi , Mark Lee

Style Variation as a Vantage Point for Code-Switching

Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities, thereby attaining prevalence in digital and social media platforms. This increasing prominence demands the need to model CS languages for…

Computation and Language · Computer Science 2020-05-04 Khyathi Raghavi Chandu , Alan W Black

Deep Latent-Variable Models for Text Generation

Text generation aims to produce human-like natural language output for down-stream tasks. It covers a wide range of applications like machine translation, document summarization, dialogue generation and so on. Recently deep neural…

Computation and Language · Computer Science 2022-03-07 Xiaoyu Shen

A Survey of Code-switched Speech and Language Processing

Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world. This survey reviews computational approaches for code-switched…

Computation and Language · Computer Science 2020-07-24 Sunayana Sitaram , Khyathi Raghavi Chandu , Sai Krishna Rallabandi , Alan W Black

Deconvolutional Latent-Variable Model for Text Sequence Matching

A latent-variable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives. To alleviate typical optimization challenges in latent-variable models for text, we…

Computation and Language · Computer Science 2017-11-23 Dinghan Shen , Yizhe Zhang , Ricardo Henao , Qinliang Su , Lawrence Carin

Text Modeling with Syntax-Aware Variational Autoencoders

Syntactic information contains structures and rules about how text sentences are arranged. Incorporating syntax into text modeling methods can potentially benefit both representation learning and generation. Variational autoencoders (VAEs)…

Computation and Language · Computer Science 2019-08-28 Yijun Xiao , William Yang Wang

Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of…

Computation and Language · Computer Science 2026-01-13 Genta Indra Winata , David Anugraha , Patrick Amadeus Irawan , Anirban Das , Haneul Yoo , Paresh Dashore , Shreyas Kulkarni , Ruochen Zhang , Haruki Sakajo , Frederikus Hudi , Anaelia Ovalle , Syrielle Montariol , Felix Gaschi , Michael Anugraha , Rutuj Ravindra Puranik , Zawad Hayat Ahmed , Adril Putra Merin , Emmanuele Chersoni

End-to-End Code Switching Language Models for Automatic Speech Recognition

In this paper, we particularly work on the code-switched text, one of the most common occurrences in the bilingual communities across the world. Due to the discrepancies in the extraction of code-switched text from an Automated Speech…

Computation and Language · Computer Science 2020-06-17 Ahan M. R. , Shreyas Sunil Kulkarni

Towards Code-switched Classification Exploiting Constituent Language Resources

Code-switching is a commonly observed communicative phenomenon denoting a shift from one language to another within the same speech exchange. The analysis of code-switched data often becomes an assiduous task, owing to the limited…

Computation and Language · Computer Science 2020-11-04 Tanvi Dadu , Kartikey Pant

Grammar Variational Autoencoder

Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such as arithmetic expressions and molecular…

Machine Learning · Statistics 2017-03-07 Matt J. Kusner , Brooks Paige , José Miguel Hernández-Lobato

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick

Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition

Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech…

Computation and Language · Computer Science 2023-07-04 Enes Yavuz Ugan , Christian Huber , Juan Hussain , Alexander Waibel

Natural Language Generation with Neural Variational Models

In this thesis, we explore the use of deep neural networks for generation of natural language. Specifically, we implement two sequence-to-sequence neural variational models - variational autoencoders (VAE) and variational encoder-decoders…

Computation and Language · Computer Science 2018-08-29 Hareesh Bahuleyan

DIVERS-Bench: Evaluating Language Identification Across Domain Shifts and Code-Switching

Language Identification (LID) is a core task in multilingual NLP, yet current systems often overfit to clean, monolingual data. This work introduces DIVERS-BENCH, a comprehensive evaluation of state-of-the-art LID models across diverse…

Computation and Language · Computer Science 2025-09-23 Jessica Ojo , Zina Kamel , David Ifeoluwa Adelani