Related papers: Employing Subsequence Matching in Audio Data Proce…

Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be…

Multimedia · Computer Science 2012-06-13 David Novak , Petr Volny , Pavel Zezula

Synthesizer Sound Matching Using Audio Spectrogram Transformers

Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-24 Fred Bruford , Frederik Blang , Shahan Nercessian

Matching Pursuits with Random Sequential Subdictionaries

Matching pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be…

Data Structures and Algorithms · Computer Science 2012-04-06 Manuel Moussallam , Laurent Daudet , Gaël Richard

Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoding, these systems are in principle open vocabulary systems. In practice,…

Computation and Language · Computer Science 2026-03-05 Christian Huber , Alexander Waibel

High-precision Voice Search Query Correction via Retrievable Speech-text Embedings

Automatic speech recognition (ASR) systems can suffer from poor recall for various reasons, such as noisy audio, lack of sufficient training data, etc. Previous work has shown that recall can be improved by retrieving rewrite candidates…

Computation and Language · Computer Science 2024-01-10 Christopher Li , Gary Wang , Kyle Kastner , Heng Su , Allen Chen , Andrew Rosenberg , Zhehuai Chen , Zelin Wu , Leonid Velikovich , Pat Rondon , Diamantino Caseiro , Petar Aleksic

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate…

Computation and Language · Computer Science 2019-07-12 Laurent El Shafey , Hagen Soltau , Izhak Shafran

SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation

Embedding-based retrieval models have made significant strides in retrieval-augmented generation (RAG) techniques for text and multimodal large language models (LLMs) applications. However, when it comes to speech larage language models…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-11 Chunyu Sun , Bingyu Liu , Zhichao Cui , Junhan Shi , Anbin Qi , Tian-hao Zhang , Dinghao Zhou , Lewei Lu

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts,…

Computation and Language · Computer Science 2022-04-12 Johann-Mattis List , Robert Forkel , Nathan W. Hill

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or…

Artificial Intelligence · Computer Science 2024-08-22 Prashant Serai , Peidong Wang , Eric Fosler-Lussier

Hybrid phonetic-neural model for correction in speech recognition systems

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Rafael Viana-Cámara , Mario Campos-Soberanis , Diego Campos-Sobrino

Getting Started with Neural Models for Semantic Matching in Web Search

The vocabulary mismatch problem is a long-standing problem in information retrieval. Semantic matching holds the promise of solving the problem. Recent advances in language technology have given rise to unsupervised neural models for…

Information Retrieval · Computer Science 2016-11-11 Kezban Dilek Onal , Ismail Sengor Altingovde , Pinar Karagoz , Maarten de Rijke

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-29 Tae Jin Park , Panayiotis Georgiou

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Self-Supervised Visual Acoustic Matching

Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. Existing methods assume access to paired training data, where the audio is observed in both source and target…

Multimedia · Computer Science 2023-11-27 Arjun Somayazulu , Changan Chen , Kristen Grauman

Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices

Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses. In this work, a relatively straightforward…

Sound · Computer Science 2021-05-26 Michał Kośmider

Goodness-of-pronunciation without phoneme time alignment

In speech evaluation, an Automatic Speech Recognition (ASR) model often computes time boundaries and phoneme posteriors for input features. However, limited data for ASR training hinders expansion of speech evaluation to low-resource…

Computation and Language · Computer Science 2026-03-27 Jeremy H. M. Wong , Nancy F. Chen

Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing

Despite the advancements in cutting-edge technologies, audio signal processing continues to pose challenges and lacks the precision of a human speech processing system. To address these challenges, we propose a novel approach to simplify…

Sound · Computer Science 2026-03-26 Rinku Sebastian , Simon O'Keefe , Martin Trefzer

Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture

Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-30 Lei Wang , Benedict Yeoh , Jun Wah Ng

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that…

Sound · Computer Science 2021-03-02 Christopher Schymura , Tsubasa Ochiai , Marc Delcroix , Keisuke Kinoshita , Tomohiro Nakatani , Shoko Araki , Dorothea Kolossa

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate contexts and reading comprehension to extract the final…

Computation and Language · Computer Science 2022-09-28 Georgios Sidiropoulos , Svitlana Vakulenko , Evangelos Kanoulas