English
Related papers

Related papers: AlloVera: A Multilingual Allophone Database

200 papers

Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily…

Computation and Language · Computer Science 2021-07-27 Brian Yan , Siddharth Dalmia , David R. Mortensen , Florian Metze , Shinji Watanabe

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can…

This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone…

Computation and Language · Computer Science 2023-08-17 Kevin Glocker , Aaricia Herygers , Munir Georges

Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages. In this work, we focus on phoneme recognition using Allosaurus, a method for multilingual…

Computation and Language · Computer Science 2021-04-06 Kathleen Siminyu , Xinjian Li , Antonios Anastasopoulos , David Mortensen , Michael R. Marlo , Graham Neubig

Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment. Modeling allophones is crucial for atypical pronunciation assessment, which involves distinguishing atypical from typical…

Computation and Language · Computer Science 2025-03-25 Kwanghee Choi , Eunjung Yeo , Kalvin Chang , Shinji Watanabe , David Mortensen

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language…

Computation and Language · Computer Science 2023-08-09 Chihiro Taguchi , Yusuke Sakai , Parisa Haghani , David Chiang

We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of phonemic contrasts. Given only 10 hours…

Computation and Language · Computer Science 2026-03-20 Maxime Poli , Manel Khentout , Angelo Ortiz Tandazo , Ewan Dunbar , Emmanuel Chemla , Emmanuel Dupoux

Automatic speech recognition (ASR) performs well for high-resource languages with abundant paired audio-transcript data, but its accuracy degrades sharply for most languages due to limited publicly available aligned data. To this end, we…

Computation and Language · Computer Science 2026-05-12 Antonis Asonitis , Luca A. Lanzendörfer , Frédéric Berdoz , Roger Wattenhofer

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Piotr Żelasko , Laureano Moro-Velázquez , Mark Hasegawa-Johnson , Odette Scharenborg , Najim Dehak

This paper introduces MauBERT, a multilingual extension of HuBERT that leverages articulatory features for robust cross-lingual phonetic representation learning. We continue HuBERT pre-training with supervision based on a…

Computation and Language · Computer Science 2025-12-23 Angelo Ortiz Tandazo , Manel Khentout , Youssef Benchekroun , Thomas Hueber , Emmanuel Dupoux

In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic…

Computation and Language · Computer Science 2024-04-03 Jian Zhu , Changbing Yang , Farhan Samir , Jahurul Islam

We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our universal vocoder offers real-time high-quality speech synthesis on a wide range of use cases. We tested it…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Yunlong Jiao , Adam Gabrys , Georgi Tinchev , Bartosz Putrycz , Daniel Korzekwa , Viacheslav Klimkov

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally"…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-19 Hao Yen , Sabato Marco Siniscalchi , Chin-Hui Lee

Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-22 Siyuan Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang

In this paper, we introduce two resources: (i) G2P+, a tool for converting orthographic datasets to a consistent phonemic representation; and (ii) IPA CHILDES, a phonemic dataset of child-centered speech across 31 languages. Prior tools for…

Computation and Language · Computer Science 2025-06-13 Zébulon Goriely , Paula Buttery

The growing prevalence of neurological disorders associated with dysarthria motivates the need for automated intelligibility assessment methods that are applicalbe across languages. However, most existing approaches are either limited to a…

Computation and Language · Computer Science 2026-02-12 Eunjung Yeo , Julie M. Liss , Visar Berisha , David R. Mortensen

This paper presents PolyIPA, a novel multilingual phoneme-to-grapheme conversion model designed for multilingual name transliteration, onomastic research, and information retrieval. The model leverages two helper models developed for data…

Computation and Language · Computer Science 2024-12-13 Davor Lauc

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain…

Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-01 Philipp Klumpp , Tomás Arias-Vergara , Paula Andrea Pérez-Toro , Elmar Nöth , Juan Rafael Orozco-Arroyave

This work presents an extensive and detailed study on Audio-Visual Speech Recognition (AVSR) for five widely spoken languages: Chinese, Spanish, English, Arabic, and French. We have collected large-scale datasets for each language except…

Computation and Language · Computer Science 2024-06-04 Sanath Narayan , Yasser Abdelaziz Dahou Djilali , Ankit Singh , Eustache Le Bihan , Hakim Hacid
‹ Prev 1 2 3 10 Next ›