Related papers: Tree compression using string grammars
We introduce forest straight-line programs (FSLPs) as a compressed representation of unranked ordered node-labelled trees. FSLPs are based on the operations of forest algebra and generalize tree straight-line programs. We compare the…
In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size…
We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after…
In this work we introduce a new linear time compression algorithm, called "Re-pair for Trees", which compresses ranked ordered trees using linear straight-line context-free tree grammars. Such grammars generalize straight-line context-free…
Extractive compression is a challenging natural language processing problem. This work contributes by formulating neural extractive compression as a parse tree transduction problem, rather than a sequence transduction task. Motivated by…
A grammar-compressed ranked tree is represented with a linear space overhead so that a single traversal step, i.e., the move to the parent or the i-th child, can be carried out in constant time. Moreover, we extend our data structure such…
Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing…
We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $\delta$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are…
We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the…
We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line…
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…
Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case,…
Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…
Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily…
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to…
We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to…
This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture…
Offering rich contexts to Large Language Models (LLMs) has shown to boost the performance in various tasks, but the resulting longer prompt would increase the computational cost and might exceed the input limit of LLMs. Recently, some…
We present an efficient algorithm for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size $n$ that represents string $T$, the algorithm…
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a…