Related papers: A Fast Algorithm for Computing Prefix Probabilitie…
We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by…
A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an…
We describe an algorithm computing an optimal prefix free code for $n$ unsorted positive weights in time within $O(n(1+\lg \alpha))\subseteq O(n\lg n)$, where the alternation $\alpha\in[1..n-1]$ measures the amount of sorting required by…
We describe an algorithm computing an optimal prefix free code from $N$ unsorted positive integer weights in time linear in the number of machine words holding those weights. This algorithm takes advantage of common non-algebraic…
Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown to be effective in unsupervised phrase-structure grammar induction. However, due to the cubic computational complexity of PCFG representation and…
The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including…
A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the…
We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings,…
Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics,…
Prefix parsing asks whether an input prefix can be extended to a complete string generated by a given grammar. In the weighted setting, it also provides prefix probabilities, which are central to context-free language modeling,…
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we…
For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is…
We study the problem of computing the probability that a given stochastic context-free grammar (SCFG), G, generates a string in a given regular language L(D) (given by a DFA, D). This basic problem has a number of applications in…
This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O…
Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of…
Many search engines such as Google, Bing & Yahoo! show search suggestions when users enter search phrases on their interfaces. These suggestions are meant to assist the user in finding what she wants quickly and also suggesting common…
We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic…
Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the…
In 1975, Valiant showed that Boolean matrix multiplication can be used for parsing context-free grammars (CFGs), yielding the asympotically fastest (although not practical) CFG parsing algorithm known. We prove a dual result: any CFG parser…
We provide the first fully polynomial-time randomized approximation scheme for the following two counting problems: 1. Given a Context Free Grammar $G$ over alphabet $\Sigma$, count the number of words of length exactly $n$ generated by…