Related papers: Fast direct access to variable length codes
For discrete memoryless multiple-access channels, we propose a general definition of variable length codes with a measure of the transmission rates at the receiver side. This gives a receiver perspective on the multiple-access channel…
Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case,…
The variable-length Reverse Multi-Delimiter (RMD) codes are known to represent sequences of unbounded and unordered integers. When applied to data compression, they combine a good compression ratio with fast decoding. In this paper, we…
The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position $i$ of a string given in compressed form. Given a grammar of size $n$ compressing a string of size…
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as…
We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression…
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super…
We investigate inference of variable-length codes in other domains of computer science, such as noisy information transmission or information retrieval-storage: in such topics, traditionally mostly constant-length codewords act. The study…
We present a new data structure called the \emph{Compressed Random Access Memory} (CRAM) that can store a dynamic string $T$ of characters, e.g., representing the memory of a computer, in compressed form while achieving asymptotically…
This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…
Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…
A distributed storage system (DSS) needs to be efficiently accessible and repairable. Recently, considerable effort has been made towards the latter, while the former is usually not considered, since a trivial solution exists in the form of…
Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access…
Grammar-based compression is a widely-accepted model of string compression that allows for efficient and direct manipulations on the compressed data. Most, if not all, such manipulations rely on the primitive \emph{random access} queries, a…
We consider universal variable-to-fixed length compression of memoryless sources with a fidelity criterion. We design a dictionary codebook over the reproduction alphabet which is used to parse the source stream. Once a source subsequence…
Relative compression, where a set of similar strings are compressed with respect to a reference string, is a very effective method of compressing DNA datasets containing multiple similar sequences. Relative compression is fast to perform…
Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…
Remote Direct Memory Access (RDMA) is a technology that allows direct memory access from the memory of one computer into that of another without involving either one's operating system. This enables high-throughput, low-latency networking,…
Recent developments in Language Models (LMs) have shown their effectiveness in NLP tasks, particularly in knowledge-intensive tasks. However, the mechanisms underlying knowledge storage and memory access within their parameters remain…
For general memoryless systems, the typical information theoretic solution - when exists - has a "single-letter" form. This reflects the fact that optimum performance can be approached by a random code (or a random binning scheme),…