English
Related papers

Related papers: Employing Subsequence Matching in Audio Data Proce…

200 papers

Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be…

Multimedia · Computer Science 2012-06-13 David Novak , Petr Volny , Pavel Zezula

Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-24 Fred Bruford , Frederik Blang , Shahan Nercessian

Matching pursuits are a class of greedy algorithms commonly used in signal processing, for solving the sparse approximation problem. They rely on an atom selection step that requires the calculation of numerous projections, which can be…

Data Structures and Algorithms · Computer Science 2012-04-06 Manuel Moussallam , Laurent Daudet , Gaël Richard

Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoding, these systems are in principle open vocabulary systems. In practice,…

Computation and Language · Computer Science 2026-03-05 Christian Huber , Alexander Waibel

Automatic speech recognition (ASR) systems can suffer from poor recall for various reasons, such as noisy audio, lack of sufficient training data, etc. Previous work has shown that recall can be improved by retrieving rewrite candidates…

Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate…

Computation and Language · Computer Science 2019-07-12 Laurent El Shafey , Hagen Soltau , Izhak Shafran

Embedding-based retrieval models have made significant strides in retrieval-augmented generation (RAG) techniques for text and multimodal large language models (LLMs) applications. However, when it comes to speech larage language models…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-11 Chunyu Sun , Bingyu Liu , Zhichao Cui , Junhan Shi , Anbin Qi , Tian-hao Zhang , Dinghao Zhou , Lewei Lu

Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts,…

Computation and Language · Computer Science 2022-04-12 Johann-Mattis List , Robert Forkel , Nathan W. Hill

Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or…

Artificial Intelligence · Computer Science 2024-08-22 Prashant Serai , Peidong Wang , Eric Fosler-Lussier

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Rafael Viana-Cámara , Mario Campos-Soberanis , Diego Campos-Sobrino

The vocabulary mismatch problem is a long-standing problem in information retrieval. Semantic matching holds the promise of solving the problem. Recent advances in language technology have given rise to unsupervised neural models for…

Information Retrieval · Computer Science 2016-11-11 Kezban Dilek Onal , Ismail Sengor Altingovde , Pinar Karagoz , Maarten de Rijke

While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-29 Tae Jin Park , Panayiotis Georgiou

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. Existing methods assume access to paired training data, where the audio is observed in both source and target…

Multimedia · Computer Science 2023-11-27 Arjun Somayazulu , Changan Chen , Kristen Grauman

Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses. In this work, a relatively straightforward…

Sound · Computer Science 2021-05-26 Michał Kośmider

In speech evaluation, an Automatic Speech Recognition (ASR) model often computes time boundaries and phoneme posteriors for input features. However, limited data for ASR training hinders expansion of speech evaluation to low-resource…

Computation and Language · Computer Science 2026-03-27 Jeremy H. M. Wong , Nancy F. Chen

Despite the advancements in cutting-edge technologies, audio signal processing continues to pose challenges and lacks the precision of a human speech processing system. To address these challenges, we propose a novel approach to simplify…

Sound · Computer Science 2026-03-26 Rinku Sebastian , Simon O'Keefe , Martin Trefzer

Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-30 Lei Wang , Benedict Yeoh , Jun Wah Ng

Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that…

Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate contexts and reading comprehension to extract the final…

Computation and Language · Computer Science 2022-09-28 Georgios Sidiropoulos , Svitlana Vakulenko , Evangelos Kanoulas
‹ Prev 1 2 3 10 Next ›