Related papers: On optimally partitioning a text to improve its co…
The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position $i$ of a string given in compressed form. Given a grammar of size $n$ compressing a string of size…
The goal of this thesis is to study the compression problems arising in distributed computing systematically. In the first part of the thesis, we study gradient compression for distributed first-order optimization. We begin by establishing…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…
We show that if DTIME[2^O(n)] is not included in DSPACE[2^o(n)], then, for every set B in PSPACE/poly, all strings x in B of length n can be represented by a string compressed(x) of length at most log(|B^{=n}|)+O(log n), such that a…
We study the problem of multiway number partition optimization, which has a myriad of applications in the decision, learning and optimization literature. Even though the original multiway partitioning problem is NP-hard and requires…
Consider an input text string T[1,N] drawn from an unbounded alphabet. We study partial computation in suffix-based problems for Data Compression and Text Indexing such as (I) retrieve any segment of K<=N consecutive symbols from the…
We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to…
We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression…
Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…
We study the optimization version of the set partition problem (where the difference between the partition sums are minimized), which has numerous applications in decision theory literature. While the set partitioning problem is NP-hard and…
Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
We show that if DTIME[2^{O(n)}] is not included in DSPACE[2^{o(n)}], then, for every set B in PSPACE, all strings x in B of length n can be represented by a string compressed(x) of length at most log (|B^{=n}|) + O(log n), such that a…
The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an…
Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest…
Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large…
A popular approach to sentence compression is to formulate the task as a constrained optimization problem and solve it with integer linear programming (ILP) tools. Unfortunately, dependence on ILP may make the compressor prohibitively slow,…
In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov…
The ability to find short representations, i.e. to compress data, is crucial for many intelligent systems. We present a theory of incremental compression showing that arbitrary data strings, that can be described by a set of features, can…
For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement…