Related papers: Deciphering Undersegmented Ancient Scripts Using P…

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic…

Computation and Language · Computer Science 2024-04-03 Jian Zhu , Changbing Yang , Farhan Samir , Jahurul Islam

Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B

In this paper we propose a novel neural approach for automatic decipherment of lost languages. To compensate for the lack of strong supervision signal, our model design is informed by patterns in language change documented in historical…

Computation and Language · Computer Science 2019-06-18 Jiaming Luo , Yuan Cao , Regina Barzilay

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts,…

Computation and Language · Computer Science 2022-04-12 Johann-Mattis List , Robert Forkel , Nathan W. Hill

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain…

Sound · Computer Science 2022-01-31 Piotr Żelasko , Siyuan Feng , Laureano Moro Velazquez , Ali Abavisani , Saurabhchand Bhati , Odette Scharenborg , Mark Hasegawa-Johnson , Najim Dehak

Isolated-Word Confusion Metrics and the PGPfone Alphabet

Although the confusion of individual phonemes and features have been studied and analyzed since (Miller and Nicely, 1955), there has been little work done on extending this to a predictive theory of word-level confusions. The PGPfone…

cmp-lg · Computer Science 2008-02-03 Patrick Juola

Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval

Word embedding or Word2Vec has been successful in offering semantics for text words learned from the context of words. Audio Word2Vec was shown to offer phonetic structures for spoken words (signal segments for words) learned from signals…

Computation and Language · Computer Science 2019-01-23 Yi-Chen Chen , Sung-Feng Huang , Chia-Hao Shen , Hung-yi Lee , Lin-shan Lee

Phonetic and Visual Priors for Decipherment of Informal Romanization

Informal romanization is an idiosyncratic process used by humans in informal digital communication to encode non-Latin script languages into Latin character sets found on common keyboards. Character substitution choices differ between users…

Computation and Language · Computer Science 2020-05-07 Maria Ryskina , Matthew R. Gormley , Taylor Berg-Kirkpatrick

ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds

Traditionally, bioacoustics has relied on spectrograms and continuous, per-frame audio representations for the analysis of animal sounds, also serving as input to machine learning models. Meanwhile, the International Phonetic Alphabet (IPA)…

Sound · Computer Science 2024-02-07 Masato Hagiwara , Marius Miron , Jen-Yu Liu

Curation of a Palaeohispanic Dataset for Machine Learning

Palaeohispanic languages are those spoken in the Iberian Peninsula before the arrival of the Romans in the 3rd Century B.C. Their study was really put on motion after G\'omez Moreno deciphered the Iberian Levantine script, one of the…

Computation and Language · Computer Science 2026-04-16 Gonzalo Martínez-Fernández , Jose F Quesada , Agustín Riscos-Núñez , Francisco José Salguero-Lamillar

Phonetic Ambiguity : Approaches, Touchstones, Pitfalls and New Approaches

Phonetic ambiguity and confusibility are bugbears for any form of bottom-up or data-driven approach to language processing. The question of when an input is ``close enough'' to a target word pervades the entire problem spaces of speech…

cmp-lg · Computer Science 2016-08-31 Patrick Juola

Language-universal phonetic encoder for low-resource speech recognition

Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-22 Siyuan Feng , Ming Tu , Rui Xia , Chuanzeng Huang , Yuxuan Wang

Probabilistic Modelling of Morphologically Rich Languages

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often…

Computation and Language · Computer Science 2015-08-19 Jan A. Botha

IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining

Recently, large-scale Vision and Language (V\&L) pretraining has become the standard backbone of many multimedia systems. While it has shown remarkable performance even in unseen situations, it often performs in ways not intuitive to…

Multimedia · Computer Science 2023-03-07 Chihaya Matsuhira , Marc A. Kastner , Takahiro Komamizu , Takatsugu Hirayama , Keisuke Doman , Yasutomo Kawanishi , Ichiro Ide

Universal Automatic Phonetic Transcription into the International Phonetic Alphabet

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language…

Computation and Language · Computer Science 2023-08-09 Chihiro Taguchi , Yusuke Sakai , Parisa Haghani , David Chiang

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

ASR has been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone…

Sound · Computer Science 2022-08-01 Da-rong Liu , Po-chun Hsu , Yi-chen Chen , Sung-feng Huang , Shun-po Chuang , Da-yi Wu , Hung-yi Lee

Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

Inspired by cognitive neuroscience studies, we introduce a novel `decoding probing' method that uses minimal pairs benchmark (BLiMP) to probe internal linguistic characteristics in neural language models layer by layer. By treating the…

Computation and Language · Computer Science 2024-03-27 Linyang He , Peili Chen , Ercong Nie , Yuanning Li , Jonathan R. Brennan

A phonetic model of non-native spoken word processing

Non-native speakers show difficulties with spoken word processing. Many studies attribute these difficulties to imprecise phonological encoding of words in the lexical memory. We test an alternative hypothesis: that some of these…

Computation and Language · Computer Science 2021-03-12 Yevgen Matusevych , Herman Kamper , Thomas Schatz , Naomi H. Feldman , Sharon Goldwater

Evaluating computational models of infant phonetic learning across languages

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in…

Computation and Language · Computer Science 2020-08-10 Yevgen Matusevych , Thomas Schatz , Herman Kamper , Naomi H. Feldman , Sharon Goldwater

Reasoning Over the Glyphs: Evaluation of LLM's Decipherment of Rare Scripts

We explore the capabilities of LVLMs and LLMs in deciphering rare scripts not encoded in Unicode. We introduce a novel approach to construct a multimodal dataset of linguistic puzzles involving such scripts, utilizing a tokenization method…

Computation and Language · Computer Science 2025-01-30 Yu-Fei Shih , Zheng-Lin Lin , Shu-Kai Hsieh

Phoneme Boundary Detection using Learnable Segmental Features

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-18 Felix Kreuk , Yaniv Sheena , Joseph Keshet , Yossi Adi