English
Related papers

Related papers: Bidirectional Text Compression in External Memory

200 papers

Simple and fast decoding is one of the main advantages of LZ77-type text encoding used in many popular file compressors such as gzip and 7zip. With the recent introduction of external memory algorithms for Lempel-Ziv factorization there is…

Data Structures and Algorithms · Computer Science 2020-12-11 Djamal Belazzougui , Juha Kärkkäinen , Dominik Kempa , Simon J. Puglisi

For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing…

Data Structures and Algorithms · Computer Science 2020-12-11 Juha Kärkkäinen , Dominik Kempa , Simon J. Puglisi

Lossless data compression has been widely studied in computer science. One of the most widely used lossless data compressions is Lempel-Zip(LZ) 77 parsing, which achieves a high compression ratio. Bidirectional (a.k.a. macro) parsing is a…

Data Structures and Algorithms · Computer Science 2018-12-12 Takaaki Nishimoto , Yasuo Tabei

We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression…

Data Structures and Algorithms · Computer Science 2011-07-15 Keisuke Goto , Shirou Maruyama , Shunsuke Inenaga , Hideo Bannai , Hiroshi Sakamoto , Masayuki Takeda

Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited.…

Computation and Language · Computer Science 2014-01-23 Felipe Sánchez-Martínez , Rafael C. Carrasco , Miguel A. Martínez-Prieto , Joaquin Adiego

The paper introduces a new lossless, highly robust compression algorithm that similar with LZW algorithm, yet the algorithm discards dictionary processing and uses irregular sequences with massive, random information instead. Then the paper…

Signal Processing · Electrical Eng. & Systems 2020-06-24 Rui Zhu

The majority of online content is written in languages other than English, and is most commonly encoded in UTF-8, the world's dominant Unicode character encoding. Traditional compression algorithms typically operate on individual bytes.…

Information Theory · Computer Science 2017-01-17 Adam Gleave , Christian Steinruecken

We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as…

Machine Learning · Computer Science 2022-06-17 Constantin Philippenko , Aymeric Dieuleveut

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…

Data Structures and Algorithms · Computer Science 2013-05-06 Julian Arz , Johannes Fischer

At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any…

Information Theory · Computer Science 2014-05-20 Hemant Kumar Saini , Satpal Singh Kushwaha , C. Rama Krishna

The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and…

Artificial Intelligence · Computer Science 2008-12-18 Esben Rune Hansen , S. Srinivasa Rao , Peter Tiedemann

Today there are many universal compression algorithms, but in most cases is for specific data better using specific algorithm - JPEG for images, MPEG for movies, etc. For textual documents there are special methods based on PPM algorithm or…

Information Theory · Computer Science 2008-12-18 Jan Platos , Jiri Dvorsky

The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve…

Information Theory · Computer Science 2013-07-16 Andrea Farruggia , Paolo Ferragina , Antonio Frangioni , Rossano Venturini

In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size $n$, they use…

Data Structures and Algorithms · Computer Science 2009-09-25 Paolo Ferragina , Travis Gagie , Giovanni Manzini

We initiate the study of differentially private data-compression schemes motivated by the insecurity of the popular "Compress-Then-Encrypt" framework. Data compression is a useful tool which exploits redundancy in data to reduce…

Computational Complexity · Computer Science 2025-09-25 Jeremiah Blocki , Seunghoon Lee , Brayan Sebastián Yepes Garcia

We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is…

Data Structures and Algorithms · Computer Science 2023-08-15 Diego Díaz-Domínguez , Gonzalo Navarro

Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…

Performance · Computer Science 2007-12-19 R. Baca , V. Snasel , J. Platos , M. Kratky , E. El-Qawasmeh

For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement…

Information Theory · Computer Science 2010-09-28 Md. Abul Kalam Azad , Rezwana Sharmeen , Shabbir Ahmad , S. M. Kamruzzaman

Recent years has witnessed dramatic progress of neural machine translation (NMT), however, the method of manually guiding the translation procedure remains to be better explored. Previous works proposed to handle such problem through…

Computation and Language · Computer Science 2019-02-01 Ya Li , Xinyu Liu , Dan Liu , Xueqiang Zhang , Junhua Liu

Existing distribution compression methods reduce the number of observations in a dataset by minimising the Maximum Mean Discrepancy (MMD) between original and compressed sets, but modern datasets are often large in both sample size and…

Machine Learning · Statistics 2026-01-28 Dominic Broadbent , Nick Whiteley , Robert Allison , Tom Lovett
‹ Prev 1 2 3 10 Next ›