Related papers: A Subsequence-Histogram Method for Generic Vocabul…

Finding consensus in speech recognition: word error minimization and other applications of confusion networks

We describe a new framework for distilling information from word lattices to improve the accuracy of speech recognition and obtain a more perspicuous representation of a set of alternative hypotheses. In the standard MAP decoding approach…

Computation and Language · Computer Science 2022-02-28 L. Mangu , E. Brill , A. Stolcke

Deep Dictionary Learning: A PARametric NETwork Approach

Deep dictionary learning seeks multiple dictionaries at different image scales to capture complementary coherent characteristics. We propose a method for learning a hierarchy of synthesis dictionaries with an image classification goal. The…

Computer Vision and Pattern Recognition · Computer Science 2019-09-04 Shahin Mahdizadehaghdam , Ashkan Panahi , Hamid Krim , Liyi Dai

A Novel Word Sense Disambiguation Approach Using WordNet Knowledge Graph

Various applications in computational linguistics and artificial intelligence rely on high-performing word sense disambiguation techniques to solve challenging tasks such as information retrieval, machine translation, question answering,…

Computation and Language · Computer Science 2021-01-11 Mohannad AlMousa , Rachid Benlamri , Richard Khoury

Image Retrieval using Histogram Factorization and Contextual Similarity Learning

Image retrieval has been a top topic in the field of both computer vision and machine learning for a long time. Content based image retrieval, which tries to retrieve images from a database visually similar to a query image, has attracted…

Computer Vision and Pattern Recognition · Computer Science 2013-04-10 Liu Liang

Consensus Sequence Segmentation

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…

Computation and Language · Computer Science 2013-12-31 Tamal Chowdhury , Rabindra Rakshit , Arko Banerjee

On Learning Sparsely Used Dictionaries from Incomplete Samples

Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only…

Machine Learning · Statistics 2018-04-26 Thanh V. Nguyen , Akshay Soni , Chinmay Hegde

PADDLE: Proximal Algorithm for Dual Dictionaries LEarning

Recently, considerable research efforts have been devoted to the design of methods to learn from data overcomplete dictionaries for sparse coding. However, learned dictionaries require the solution of an optimization problem for coding new…

Machine Learning · Computer Science 2010-11-17 Curzio Basso , Matteo Santoro , Alessandro Verri , Silvia Villa

Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling

Most end-to-end speech recognition systems model text directly as a sequence of characters or sub-words. Current approaches to sub-word extraction only consider character sequence frequencies, which at times produce inferior sub-word…

Computation and Language · Computer Science 2019-02-22 Hainan Xu , Shuoyang Ding , Shinji Watanabe

Palindromic Decompositions with Gaps and Errors

Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient…

Data Structures and Algorithms · Computer Science 2017-03-28 Michał Adamczyk , Mai Alzamel , Panagiotis Charalampopoulos , Costas S. Iliopoulos , Jakub Radoszewski

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast…

Information Theory · Computer Science 2022-09-14 Jean Barbier , Nicolas Macris

An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery

This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text.…

Computation and Language · Computer Science 2007-05-23 Michael R. Brent

Sparse Dictionary-Based Solution of Dynamic Inverse Problems

In ill-posed dynamic inverse problems expected spatial features and temporal correlation between frames can be leveraged to improve the quality of the computed solution, in particular when the available data are limited and the…

Numerical Analysis · Mathematics 2026-02-24 Aidan Mason-Mackay , Daniela Calvetti , Erkki Somersalo , Antti Aarnio , Mikko Kettunen , Ekaterina Paasonen Olli Gröhn , Ville Kolehmainen

A fast patch-dictionary method for whole image recovery

Various algorithms have been proposed for dictionary learning. Among those for image processing, many use image patches to form dictionaries. This paper focuses on whole-image recovery from corrupted linear measurements. We address the open…

Computer Vision and Pattern Recognition · Computer Science 2014-08-19 Yangyang Xu , Wotao Yin

Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

The problem of out of vocabulary words (OOV) is typical for any speech recognition system, hybrid systems are usually constructed to recognize a fixed set of words and rarely can include all the words that will be encountered during…

Computation and Language · Computer Science 2020-03-23 Nikolay Malkovsky , Vladimir Bataev , Dmitrii Sviridkin , Natalia Kizhaeva , Aleksandr Laptev , Ildar Valiev , Oleg Petrov

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and…

Computation and Language · Computer Science 2007-05-23 Ido Dagan , Lillian Lee , Fernando C. N. Pereira

Speech perception: a model of word recognition

We present a model of speech perception which takes into account effects of correlations between sounds. Words in this model correspond to the attractors of a suitably chosen descent dynamics. The resulting lexicon is rich in short words,…

Statistical Mechanics · Physics 2025-02-28 Jean-Marc Luck , Anita Mehta

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but…

Computation and Language · Computer Science 2014-12-09 Awni Y. Hannun , Andrew L. Maas , Daniel Jurafsky , Andrew Y. Ng

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

The ad-hoc retrieval task is to rank related documents given a query and a document collection. A series of deep learning based approaches have been proposed to solve such problem and gained lots of attention. However, we argue that they…

Information Retrieval · Computer Science 2021-02-23 Xueli Yu , Weizhi Xu , Zeyu Cui , Shu Wu , Liang Wang

A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries

We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary…

Machine Learning · Statistics 2014-07-08 Alekh Agarwal , Animashree Anandkumar , Praneeth Netrapalli

SubGram: Extending Skip-gram Word Representation with Substrings

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because…

Computation and Language · Computer Science 2020-07-09 Tom Kocmi , Ondřej Bojar