English
Related papers

Related papers: CNN-based Spoken Term Detection and Localization w…

200 papers

A statistical model for segmentation and word discovery in continuous speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described. Results of empirical tests showing that the…

Computation and Language · Computer Science 2007-05-23 Anand Venkataraman

In this paper, we propose a deep convolutional neural network-based acoustic word embedding system on code-switching query by example spoken term detection. Different from previous configurations, we combine audio data in two languages for…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-26 Murong Ma , Haiwei Wu , Xuyang Wang , Lin Yang , Junjie Wang , Ming Li

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units. Such whole-word segmental systems rely on a function that maps a variable-length speech…

Computation and Language · Computer Science 2016-01-11 Herman Kamper , Weiran Wang , Karen Livescu

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to…

Computation and Language · Computer Science 2019-02-20 Shruti Palaskar , Vikas Raunak , Florian Metze

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while…

Computer Vision and Pattern Recognition · Computer Science 2017-11-01 Themos Stafylakis , Georgios Tzimiropoulos

(Part of the abstract) In this thesis, we investigate the use of unsupervised spoken term discovery in tackling this problem. Unsupervised spoken term discovery aims to discover topic-related terminologies in a speech without knowing the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-01 Man-Ling Sung

We propose to learn acoustic word embeddings with temporal context for query-by-example (QbE) speech search. The temporal context includes the leading and trailing word sequences of a word. We assume that there exist spoken word pairs in…

Computation and Language · Computer Science 2018-06-19 Yougen Yuan , Cheung-Chi Leung , Lei Xie , Hongjie Chen , Bin Ma , Haizhou Li

This paper proposes a Sub-band Convolutional Neural Network for spoken term classification. Convolutional neural networks (CNNs) have proven to be very effective in acoustic applications such as spoken term classification, keyword spotting,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-03 Chieh-Chi Kao , Ming Sun , Yixin Gao , Shiv Vitaladevuni , Chao Wang

This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in…

Computation and Language · Computer Science 2017-08-17 Marcos V. Treviso , Christopher D. Shulby , Sandra M. Aluisio

Recent developments in deep learning with application to language modeling have led to success in tasks of text processing, summarizing and machine translation. However, deploying huge language models for mobile device such as on-device…

Computation and Language · Computer Science 2017-07-07 Seunghak Yu , Nilesh Kulkarni , Haejun Lee , Jihie Kim

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in…

Computation and Language · Computer Science 2018-08-29 Zied Elloumi , Laurent Besacier , Olivier Galibert , Benjamin Lecouteux

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in…

Machine Learning · Statistics 2017-07-19 Robert Bamler , Stephan Mandt

The paper describes a novel approach to Spoken Term Detection (STD) in large spoken archives using deep LSTM networks. The work is based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly…

Computation and Language · Computer Science 2022-10-24 Jan Švec , Luboš Šmídl , Josef V. Psutka , Aleš Pražák

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…

Computation and Language · Computer Science 2013-12-31 Tamal Chowdhury , Rabindra Rakshit , Arko Banerjee

Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning…

Sound · Computer Science 2023-12-12 Sumedha Rai , Tong Li , Bella Lyu

We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and…

Sound · Computer Science 2018-05-01 Jeff Hetherly , Paul Gamble , Maria Barrios , Cory Stephenson , Karl Ni

A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing…

Computation and Language · Computer Science 2007-05-23 Anand Venkataraman

In this paper, we propose a context-aware keyword spotting model employing a character-level recurrent neural network (RNN) for spoken term detection in continuous speech. The RNN is end-to-end trained with connectionist temporal…

Computation and Language · Computer Science 2015-12-31 Kyuyeon Hwang , Minjae Lee , Wonyong Sung

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing. Audio Word2Vec previously proposed was…

Computation and Language · Computer Science 2018-11-08 Sung-Feng Huang , Yi-Chen Chen , Hung-yi Lee , Lin-shan Lee

This study addresses the problem of identifying the meaning of unknown words or entities in a discourse with respect to the word embedding approaches used in neural language models. We proposed a method for on-the-fly construction and…

Computation and Language · Computer Science 2017-10-18 Sosuke Kobayashi , Naoaki Okazaki , Kentaro Inui
‹ Prev 1 2 3 10 Next ›