Related papers: Context tree selection and linguistic rhythm retri…

Approximate group context tree

We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and…

Methodology · Statistics 2016-01-01 Alexandre Belloni , Roberto I. Oliveira

Phonetically rich corpus construction for a low-resourced language

Speech technologies rely on capturing a speaker's voice variability while obtaining comprehensive language information. Textual prompts and sentence selection methods have been proposed in the literature to comprise such adequate phonetic…

Computation and Language · Computer Science 2024-02-09 Marcellus Amadeus , William Alberto Cruz Castañeda , Wilmer Lobato , Niasche Aquino

Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks

Languages have long been described according to their perceived rhythmic attributes. The associated typologies are of interest in psycholinguistics as they partly predict newborns' abilities to discriminate between languages and provide…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-29 François Deloche , Laurent Bonnasse-Gahot , Judit Gervain

An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery

This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text.…

Computation and Language · Computer Science 2007-05-23 Michael R. Brent

Context Tree Selection: A Unifying View

The present paper investigates non-asymptotic properties of two popular procedures of context tree (or Variable Length Markov Chains) estimation: Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion. First showing how…

Statistics Theory · Mathematics 2011-06-30 Aurélien Garivier , Florencia Leonardi

Joint estimation of intersecting context tree models

We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and…

Statistics Theory · Mathematics 2013-08-12 Antonio Galves , Aurélien Garivier , Elisabeth Gassiat

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities

Measuring similarities between strings is central for many established and fast growing research areas including information retrieval, biology, and natural language processing. The traditional approach for string similarity measurements is…

Information Retrieval · Computer Science 2018-08-20 Mehdi Ben Lazreg , Morten Goodwin

Nonparametric statistical inference for the context tree of a stationary ergodic process

We consider the problem of estimating the context tree of a stationary ergodic process with finite alphabet without imposing additional conditions on the process. As a starting point we introduce a Hamming metric in the space of irreducible…

Statistics Theory · Mathematics 2015-08-21 Sandro Gallo , Florencia Leonardi

Strong correlations between text quality and complex networks features

Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks…

Physics and Society · Physics 2009-11-11 Lucas Antiqueira , Maria das Gracas V. Nunes , Osvaldo N. Oliveira , Luciano da F. Costa

Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines

Rhetoric, both spoken and written, involves not only content but also style. One common stylistic tool is $\textit{parallelism}$: the juxtaposition of phrases which have the same sequence of linguistic ($\textit{e.g.}$, phonological,…

Computation and Language · Computer Science 2023-12-04 Stephen Bothwell , Justin DeBenedetto , Theresa Crnkovich , Hildegund Müller , David Chiang

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build…

Machine Learning · Computer Science 2018-10-23 Adam N. Elmachtoub , Ryan McNellis , Sechan Oh , Marek Petrik

A statistical learning algorithm for word segmentation

In natural speech, the speaker does not pause between words, yet a human listener somehow perceives this continuous stream of phonemes as a series of distinct words. The detection of boundaries between spoken words is an instance of a…

Computation and Language · Computer Science 2011-06-28 Jerry R. Van Aken

Language Detection For Short Text Messages In Social Media

With the constant growth of the World Wide Web and the number of documents in different languages accordingly, the need for reliable language detection tools has increased as well. Platforms such as Twitter with predominantly short texts…

Computation and Language · Computer Science 2016-08-31 Ivana Balazevic , Mikio Braun , Klaus-Robert Müller

Audio Retrieval with Natural Language Queries: A Benchmark Study

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given written description and vice versa. Text-audio retrieval…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 A. Sophia Koepke , Andreea-Maria Oncescu , João F. Henriques , Zeynep Akata , Samuel Albanie

Context Tree Estimation in Variable Length Hidden Markov Models

We address the issue of context tree estimation in variable length hidden Markov models. We propose an estimator of the context tree of the hidden Markov process which needs no prior upper bound on the depth of the context tree. We prove…

Information Theory · Computer Science 2011-09-15 Thierry Dumont

Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis

Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Although these models can generate speech with high quality and…

Sound · Computer Science 2024-08-30 Zehai Tu , Guangyan Zhang , Yiting Lu , Adaeze Adigwe , Simon King , Yiwen Guo

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a…

Machine Learning · Computer Science 2023-06-05 Christopher Liao , Theodoros Tsiligkaridis , Brian Kulis

An automated approach to mitigate transcription errors in braille texts for the Portuguese language

The quota system in Brazil made it possible to include blind students in higher education. Teachers' lack of knowledge about the braille system can represent a barrier between them and students who use it for writing and reading.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 André Roberto Ortoncelli , Marlon Marcon , Franciele Beal

An Efficient Bayes Coding Algorithm for the Non-Stationary Source in Which Context Tree Model Varies from Interval to Interval

The context tree source is a source model in which the occurrence probability of symbols is determined from a finite past sequence, and is a broader class of sources that includes i.i.d. and Markov sources. The proposed source model in this…

Information Theory · Computer Science 2021-05-14 Koshi Shimada , Shota Saito , Toshiyasu Matsushima

Spaces, Trees and Colors: The Algorithmic Landscape of Document Retrieval on Sequences

Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current…

Information Retrieval · Computer Science 2013-10-01 Gonzalo Navarro