English
Related papers

Related papers: Compression with the tudocomp Framework

200 papers

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…

Data Structures and Algorithms · Computer Science 2013-05-06 Julian Arz , Johannes Fischer

The pressing need for eficient compression schemes for XML documents has recently been focused on stack computation [6, 9], and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal…

Information Theory · Computer Science 2007-09-17 Pilar Albert , Elvira Mayordomo , Philippe Moser , Sylvain Perifel

The pressing need for efficient compression schemes for XML documents has recently been focused on stack computation, and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal…

Computational Complexity · Computer Science 2009-03-25 Elvira Mayordomo , Philippe Moser , Sylvain Perifel

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…

Data Structures and Algorithms · Computer Science 2007-06-11 Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that…

Data Structures and Algorithms · Computer Science 2007-05-23 Philip Bille , Rolf Fagerberg , Inge Li Goertz

We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a…

Computation and Language · Computer Science 2024-05-06 Li Wan , Tansu Alpcan , Margreta Kuijper , Emanuele Viterbo

This article gives a self-contained analysis of the performance of the Lempel-Ziv compression algorithm on (hidden) Markovian sources. Specifically we include a full proof of the assertion that the compression rate approaches the entropy…

Information Theory · Computer Science 2019-10-03 Madhu Sudan , David Xiang

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better…

Data Structures and Algorithms · Computer Science 2020-03-06 Luís M. S. Russo , Ana D. Correia , Gonzalo Navarro , Alexandre P. Francisco

Lempel-Ziv-Double (LZD) is a variation of the LZ78 compression scheme that achieves better compression on repetitive datasets. Nevertheless, prior research has identified computational inefficiencies and a weakness in its compressibility…

Data Structures and Algorithms · Computer Science 2025-05-05 Linus Götz , Dominik Köppl

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel…

Hardware Architecture · Computer Science 2024-09-20 Tao Chen , Suwen Song , Zhongfeng Wang

The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verd\'{u}-Weissman (GVW) and their underlying…

Information Theory · Computer Science 2009-04-23 Chris Gioran , Ioannis Kontoyiannis

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

Compression techniques that support fast random access are a core component of any information system. Current state-of-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive…

Data Structures and Algorithms · Computer Science 2015-03-19 Christopher Hoobin , Simon J. Puglisi , Justin Zobel

The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important…

Multimedia · Computer Science 2019-02-14 Pasquale De Luca , Vincenzo Maria Russiello , Raffaele Ciro Sannino , Lorenzo Valente

Tokenization efficiency plays a critical role in the performance and cost of large language models (LLMs), yet most models rely on static tokenizers optimized on general-purpose corpora. These tokenizers' fixed vocabularies often fail to…

Computation and Language · Computer Science 2025-10-27 Saibo Geng , Nathan Ranchin , Yunzhen yao , Maxime Peyrard , Chris Wendler , Michael Gastpar , Robert West

Data compression continues to evolve, with traditional information theory methods being widely used for compressing text, images, and videos. Recently, there has been growing interest in leveraging Generative AI for predictive compression…

Information Theory · Computer Science 2024-09-24 Swathi Shree Narashiman , Nitin Chandrachoodan

Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Data compression offers an attractive approach to reducing communication costs by using available bandwidth effectively.…

Information Theory · Computer Science 2007-07-13 B. S. Shajee Mohan , V. K. Govindan

The well-known dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are the basis of several universal lossless compression techniques. These algorithms are asymmetric regarding encoding/decoding time and memory requirements, with…

Data Structures and Algorithms · Computer Science 2009-12-31 Artur Ferreira , Arlindo Oliveira , Mario Figueiredo

Large language models have drastically changed the prospects of AI by introducing technologies for more complex natural language processing. However, current methodologies to train such LLMs require extensive resources including but not…

Computation and Language · Computer Science 2026-04-27 Noel Elias , Homa Esfahanizadeh , Kaan Kale , Sriram Vishwanath , Muriel Medard

We propose algorithms computing the semi-greedy Lempel-Ziv 78 (LZ78), the Lempel-Ziv Double (LZD), and the Lempel-Ziv-Miller-Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data…

Data Structures and Algorithms · Computer Science 2024-09-24 Dominik Köppl
‹ Prev 1 2 3 10 Next ›