Related papers: AlloVera: A Multilingual Allophone Database

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily…

Computation and Language · Computer Science 2021-07-27 Brian Yan , Siddharth Dalmia , David R. Mortensen , Florian Metze , Shinji Watanabe

Universal Phone Recognition with a Multilingual Allophone System

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can…

Computation and Language · Computer Science 2020-02-28 Xinjian Li , Siddharth Dalmia , Juncheng Li , Matthew Lee , Patrick Littell , Jiali Yao , Antonios Anastasopoulos , David R. Mortensen , Graham Neubig , Alan W Black , Florian Metze

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone…

Computation and Language · Computer Science 2023-08-17 Kevin Glocker , Aaricia Herygers , Munir Georges

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages. In this work, we focus on phoneme recognition using Allosaurus, a method for multilingual…

Computation and Language · Computer Science 2021-04-06 Kathleen Siminyu , Xinjian Li , Antonios Anastasopoulos , David Mortensen , Michael R. Marlo , Graham Neubig

Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment. Modeling allophones is crucial for atypical pronunciation assessment, which involves distinguishing atypical from typical…

Computation and Language · Computer Science 2025-03-25 Kwanghee Choi , Eunjung Yeo , Kalvin Chang , Shinji Watanabe , David Mortensen

Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language…

Computation and Language · Computer Science 2023-08-09 Chihiro Taguchi , Yusuke Sakai , Parisa Haghani , David Chiang

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of phonemic contrasts. Given only 10 hours…

Computation and Language · Computer Science 2026-03-20 Maxime Poli , Manel Khentout , Angelo Ortiz Tandazo , Ewan Dunbar , Emmanuel Chemla , Emmanuel Dupoux

WorldSpeech: A Multilingual Speech Corpus from Around the World

Automatic speech recognition (ASR) performs well for high-resource languages with abundant paired audio-transcript data, but its accuracy degrades sharply for most languages due to limited publicly available aligned data. To this end, we…

Computation and Language · Computer Science 2026-05-12 Antonis Asonitis , Luca A. Lanzendörfer , Frédéric Berdoz , Roger Wattenhofer

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Piotr Żelasko , Laureano Moro-Velázquez , Mark Hasegawa-Johnson , Odette Scharenborg , Najim Dehak

MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery

This paper introduces MauBERT, a multilingual extension of HuBERT that leverages articulatory features for robust cross-lingual phonetic representation learning. We continue HuBERT pre-training with supervision based on a…

Computation and Language · Computer Science 2025-12-23 Angelo Ortiz Tandazo , Manel Khentout , Youssef Benchekroun , Thomas Hueber , Emmanuel Dupoux

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic…

Computation and Language · Computer Science 2024-04-03 Jian Zhu , Changbing Yang , Farhan Samir , Jahurul Islam

Universal Neural Vocoding with Parallel WaveNet

We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our universal vocoder offers real-time high-quality speech synthesis on a wide range of use cases. We tested it…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Yunlong Jiao , Adam Gabrys , Georgi Tinchev , Bartosz Putrycz , Daniel Korzekwa , Viacheslav Klimkov

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally"…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-19 Hao Yen , Sabato Marco Siniscalchi , Chin-Hui Lee

Language-universal phonetic encoder for low-resource speech recognition

Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-22 Siyuan Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang

IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling

In this paper, we introduce two resources: (i) G2P+, a tool for converting orthographic datasets to a consistent phonemic representation; and (ii) IPA CHILDES, a phonemic dataset of child-centered speech across 31 languages. Prior tools for…

Computation and Language · Computer Science 2025-06-13 Zébulon Goriely , Paula Buttery

Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling

The growing prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods that are applicalbe across languages. However, most existing approaches are either limited to a…

Computation and Language · Computer Science 2026-02-12 Eunjung Yeo , Julie M. Liss , Visar Berisha , David R. Mortensen

PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model

This paper presents PolyIPA, a novel multilingual phoneme-to-grapheme conversion model designed for multilingual name transliteration, onomastic research, and information retrieval. The model leverages two helper models developed for data…

Computation and Language · Computer Science 2024-12-13 Davor Lauc

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain…

Sound · Computer Science 2022-01-31 Piotr Żelasko , Siyuan Feng , Laureano Moro Velazquez , Ali Abavisani , Saurabhchand Bhati , Odette Scharenborg , Mark Hasegawa-Johnson , Najim Dehak

Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-01 Philipp Klumpp , Tomás Arias-Vergara , Paula Andrea Pérez-Toro , Elmar Nöth , Juan Rafael Orozco-Arroyave

ViSpeR: Multilingual Audio-Visual Speech Recognition

This work presents an extensive and detailed study on Audio-Visual Speech Recognition (AVSR) for five widely spoken languages: Chinese, Spanish, English, Arabic, and French. We have collected large-scale datasets for each language except…

Computation and Language · Computer Science 2024-06-04 Sanath Narayan , Yasser Abdelaziz Dahou Djilali , Ankit Singh , Eustache Le Bihan , Hakim Hacid