Related papers: Lexical Access for Speech Understanding using Mini…

Segmenting speech without a lexicon: The roles of phonotactics and speech source

Infants face the difficult problem of segmenting continuous speech into words without the benefit of a fully developed lexicon. Several sources of information in speech might help infants solve this problem, including prosody, semantic…

cmp-lg · Computer Science 2008-02-03 Timothy Andrew Cartwright , Michael R. Brent

Speech perception: a model of word recognition

We present a model of speech perception which takes into account effects of correlations between sounds. Words in this model correspond to the attractors of a suitably chosen descent dynamics. The resulting lexicon is rich in short words,…

Statistical Mechanics · Physics 2025-02-28 Jean-Marc Luck , Anita Mehta

Lexicon Learning for Few-Shot Neural Sequence Modeling

Sequence-to-sequence transduction is the core problem in language processing applications as diverse as semantic parsing, machine translation, and instruction following. The neural network models that provide the dominant solution to these…

Computation and Language · Computer Science 2021-06-09 Ekin Akyürek , Jacob Andreas

Acquiring a Lexicon from Unsegmented Speech

We present work-in-progress on the machine acquisition of a lexicon from sentences that are each an unsegmented phone sequence paired with a primitive representation of meaning. A simple exploratory algorithm is described, along with the…

cmp-lg · Computer Science 2008-02-03 Carl de Marcken

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Lexical Acquisition via Constraint Solving

This paper describes a method to automatically acquire the syntactic and semantic classifications of unknown words. Our method reduces the search space of the lexical acquisition problem by utilizing both the left and the right context of…

cmp-lg · Computer Science 2016-08-31 Ted Pedersen , Weidong Chen

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech

The difficulty of acquiring abundant, high-quality data, especially in multi-lingual contexts, has sparked interest in addressing low-resource scenarios. Moreover, current literature rely on fixed expressions from language IDs, which…

Sound · Computer Science 2024-09-30 Youngjae Kim , Yejin Jeon , Gary Geunbae Lee

Minimal Effective Theory for Phonotactic Memory: Capturing Local Correlations due to Errors in Speech

Spoken language evolves constrained by the economy of speech, which depends on factors such as the structure of the human mouth. This gives rise to local phonetic correlations in spoken words. Here we demonstrate that these local…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-07 Paul Myles Eugenio

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for downstream understanding and generation.…

Audio and Speech Processing · Electrical Eng. & Systems 2026-03-12 Xuan Shi , Chang Zeng , Tiantian Feng , Shih-Heng Wang , Jianbo Ma , Shrikanth Narayanan

Processing Self Corrections in a speech to speech system

Speech repairs occur often in spontaneous spoken dialogues. The ability to detect and correct those repairs is necessary for any spoken language system. We present a framework to detect and correct speech repairs where all relevant levels…

Computation and Language · Computer Science 2007-05-23 Joerg Spilker , Martin Klarner , Guenther Goerz

From Audio to Semantics: Approaches to end-to-end spoken language understanding

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Parisa Haghani , Arun Narayanan , Michiel Bacchiani , Galen Chuang , Neeraj Gaur , Pedro Moreno , Rohit Prabhavalkar , Zhongdi Qu , Austin Waters

Hearings and mishearings: decrypting the spoken word

We propose a model of the speech perception of individual words in the presence of mishearings. This phenomenological approach is based on concepts used in linguistics, and provides a formalism that is universal across languages. We put…

Computation and Language · Computer Science 2020-10-19 Anita Mehta , Jean-Marc Luck

Decoding visemes: improving machine lipreading

Machine lipreading (MLR) is speech recognition from visual cues and a niche research problem in speech processing & computer vision. Current challenges fall into two groups: the content of the video, such as rate of speech or; the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-09 Helen L Bear

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Thai-Son Nguyen , Ngoc-Quan Pham , Sebastian Stueker , Alex Waibel

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Recent works have shown promising results in connecting speech encoders to large language models (LLMs) for speech recognition. However, several limitations persist, including limited fine-tuning options, a lack of mechanisms to enforce…

Machine Learning · Computer Science 2024-06-26 Van Tung Pham , Yist Lin , Tao Han , Wei Li , Jun Zhang , Lu Lu , Yuxuan Wang

Communicating Sound Through Natural Language

Natural language is widely used to describe, prompt, and control audio systems, but rarely serves as the representation carrying audio itself. We introduce lexical acoustic coding (LAC), a framework in which pre-trained LLM sender and…

Machine Learning · Computer Science 2026-05-12 Emanuele Rossi , Emanuele Rodolà

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence…

Sound · Computer Science 2019-02-22 Albert Haque , Michelle Guo , Prateek Verma , Li Fei-Fei

Enhancing Pre-trained Language Model with Lexical Simplification

For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple…

Computation and Language · Computer Science 2021-01-01 Rongzhou Bao , Jiayi Wang , Zhuosheng Zhang , Hai Zhao

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations

Transformer-based speech language models (SLMs) have significantly improved neural speech recognition and understanding. While existing research has examined how well SLMs encode shallow acoustic and phonetic features, the extent to which…

Computation and Language · Computer Science 2025-09-22 Linyang He , Qiaolin Wang , Xilin Jiang , Nima Mesgarani

Lexical Access Model for Italian -- Modeling human speech processing: identification of words in running speech toward lexical access based on the detection of landmarks and other acoustic cues to features

Modelling the process that a listener actuates in deriving the words intended by a speaker requires setting a hypothesis on how lexical items are stored in memory. This work aims at developing a system that imitates humans when identifying…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-07 Maria-Gabriella Di Benedetto , Stefanie Shattuck-Hufnagel , Jeung-Yoon Choi , Luca De Nardis , Javier Arango , Ian Chan , Alec DeCaprio