Related papers: A Fast Algorithm for Computing Prefix Probabilitie…

An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by…

cmp-lg · Computer Science 2008-02-03 Andreas Stolcke

On Combinatorial Generation of Prefix Normal Words

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an…

Data Structures and Algorithms · Computer Science 2014-06-23 Péter Burcsi , Gabriele Fici , Zsuzsanna Lipták , Frank Ruskey , Joe Sawada

Optimal Prefix Free Codes With Partial Sorting

We describe an algorithm computing an optimal prefix free code for $n$ unsorted positive weights in time within $O(n(1+\lg \alpha))\subseteq O(n\lg n)$, where the alternation $\alpha\in[1..n-1]$ measures the amount of sorting required by…

Data Structures and Algorithms · Computer Science 2016-02-02 Jérémy Barbay

Optimal Prefix Free Code in Linear Time

We describe an algorithm computing an optimal prefix free code from $N$ unsorted positive integer weights in time linear in the number of machine words holding those weights. This algorithm takes advantage of common non-algebraic…

Data Structures and Algorithms · Computer Science 2017-03-02 Jérémy Barbay

PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols

Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and…

Computation and Language · Computer Science 2021-04-29 Songlin Yang , Yanpeng Zhao , Kewei Tu

Parsing Inside-Out

The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including…

cmp-lg · Computer Science 2007-05-23 Joshua Goodman

Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the…

Data Structures and Algorithms · Computer Science 2016-09-30 Ahmed Belal , Amr Elmasry

Bubble-Flip -- A New Generation Algorithm for Prefix Normal Words

We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings,…

Data Structures and Algorithms · Computer Science 2024-04-16 Ferdinando Cicalese , Zsuzsanna Lipták , Massimiliano Rossi

On the Computation of Distances for Probabilistic Context-Free Grammars

Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics,…

Formal Languages and Automata Theory · Computer Science 2014-07-08 Colin de la Higuera , James Scicluna , Mark-Jan Nederhof

Prefix Parsing is Just Parsing

Prefix parsing asks whether an input prefix can be extended to a complete string generated by a given grammar. In the weighted setting, it also provides prefix probabilities, which are central to context-free language modeling,…

Computation and Language · Computer Science 2026-05-05 Clemente Pasti , Andreas Opedal , Timothy J. O'Donnell , Ryan Cotterell , Tim Vieira

Longest Common Prefixes with $k$-Errors and Applications

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we…

Data Structures and Algorithms · Computer Science 2018-01-16 Lorraine A. K. Ayad , Panagiotis Charalampopoulos , Costas S. Iliopoulos , Solon P. Pissis

A Note on the Longest Common Compatible Prefix Problem for Partial Words

For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is…

Data Structures and Algorithms · Computer Science 2013-12-10 Maxime Crochemore , Costas S. Iliopoulos , Tomasz Kociumaka , Marcin Kubica , Alessio Langiu , Jakub Radoszewski , Wojciech Rytter , Bartosz Szreder , Tomasz Waleń

Stochastic Context-Free Grammars, Regular Languages, and Newton's Method

We study the problem of computing the probability that a given stochastic context-free grammar (SCFG), G, generates a string in a given regular language L(D) (given by a DFA, D). This basic problem has a number of applications in…

Formal Languages and Automata Theory · Computer Science 2013-02-27 Kousha Etessami , Alistair Stewart , Mihalis Yannakakis

Efficient Semiring-Weighted Earley Parsing

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O…

Computation and Language · Computer Science 2023-07-07 Andreas Opedal , Ran Zmigrod , Tim Vieira , Ryan Cotterell , Jason Eisner

Reserved-Length Prefix Coding

Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of…

Information Theory · Computer Science 2008-01-03 Michael B. Baer

An $O(k \log{n})$ algorithm for prefix based ranked autocomplete

Many search engines such as Google, Bing & Yahoo! show search suggestions when users enter search phrases on their interfaces. These suggestions are meant to assist the user in finding what she wants quickly and also suggesting common…

Data Structures and Algorithms · Computer Science 2021-11-01 Dhruv Matani

Precise n-gram Probabilities from Stochastic Context-free Grammars

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic…

cmp-lg · Computer Science 2022-02-28 Andreas Stolcke , Jonathan Segal

Prefix Probabilities from Stochastic Tree Adjoining Grammars

Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the…

Computation and Language · Computer Science 2007-05-23 Mark-Jan Nederhof , Anoop Sarkar , Giorgio Satta

Fast Context-Free Grammar Parsing Requires Fast Boolean Matrix Multiplication

In 1975, Valiant showed that Boolean matrix multiplication can be used for parsing context-free grammars (CFGs), yielding the asympotically fastest (although not practical) CFG parsing algorithm known. We prove a dual result: any CFG parser…

Computation and Language · Computer Science 2007-05-23 Lillian Lee

#CFG and #DNNF admit FPRAS

We provide the first fully polynomial-time randomized approximation scheme for the following two counting problems: 1. Given a Context Free Grammar $G$ over alphabet $\Sigma$, count the number of words of length exactly $n$ generated by…

Data Structures and Algorithms · Computer Science 2026-05-18 Kuldeep S. Meel , Alexis de Colnet