Related papers: Bidirectional Text Compression in External Memory
Simple and fast decoding is one of the main advantages of LZ77-type text encoding used in many popular file compressors such as gzip and 7zip. With the recent introduction of external memory algorithms for Lempel-Ziv factorization there is…
For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing…
Lossless data compression has been widely studied in computer science. One of the most widely used lossless data compressions is Lempel-Zip(LZ) 77 parsing, which achieves a high compression ratio. Bidirectional (a.k.a. macro) parsing is a…
We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression…
Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited.…
The paper introduces a new lossless, highly robust compression algorithm that similar with LZW algorithm, yet the algorithm discards dictionary processing and uses irregular sequences with massive, random information instead. Then the paper…
The majority of online content is written in languages other than English, and is most commonly encoded in UTF-8, the world's dominant Unicode character encoding. Traditional compression algorithms typically operate on individual bytes.…
We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as…
We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…
At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any…
The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and…
Today there are many universal compression algorithms, but in most cases is for specific data better using specific algorithm - JPEG for images, MPEG for movies, etc. For textual documents there are special methods based on PPM algorithm or…
The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve…
In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size $n$, they use…
We initiate the study of differentially private data-compression schemes motivated by the insecurity of the popular "Compress-Then-Encrypt" framework. Data compression is a useful tool which exploits redundancy in data to reduce…
We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is…
Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…
For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement…
Recent years has witnessed dramatic progress of neural machine translation (NMT), however, the method of manually guiding the translation procedure remains to be better explored. Previous works proposed to handle such problem through…
Existing distribution compression methods reduce the number of observations in a dataset by minimising the Maximum Mean Discrepancy (MMD) between original and compressed sets, but modern datasets are often large in both sample size and…