Related papers: Crossword: A Semantic Approach to Data Compression…

Semantic Text Compression for Classification

We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact…

Information Theory · Computer Science 2023-09-20 Emrecan Kutay , Aylin Yener

Combinatorial Entropy Encoding

This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of…

Information Theory · Computer Science 2017-03-24 Abu Bakar Siddique

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…

Other Computer Science · Computer Science 2015-12-23 Vivek Dimri , Prof. Ranjit Biswas

Semantic Huffman Coding using Synonymous Mapping

Semantic communication stands out as a highly promising avenue for future developments in communications. Theoretically, source compression coding based on semantics can achieve lower rates than Shannon entropy. This paper introduces a…

Information Theory · Computer Science 2024-01-29 Jin Xu , Kai Niu , Zijian Liang , Ping Zhang

Statistical Mechanics of Semantic Compression

The basic problem of semantic compression is to minimize the length of a message while preserving its meaning. This differs from classical notions of compression in that the distortion is not measured directly at the level of bits, but…

Disordered Systems and Neural Networks · Physics 2025-03-04 Tankut Can

Learning Content-Weighted Deep Image Compression

Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance. Most existing methods adopt spatially invariant bit length allocation and incorporate discrete entropy approximation to constrain…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Mu Li , Wangmeng Zuo , Shuhang Gu , Jane You , David Zhang

Data Compression with Relative Entropy Coding

Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio…

Information Theory · Computer Science 2026-03-25 Gergely Flamich

IDBE - An Intelligent Dictionary Based Encoding Algorithm for Text Data Compression for High Speed Data Transmission Over Internet

Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Data compression offers an attractive approach to reducing communication costs by using available bandwidth effectively.…

Information Theory · Computer Science 2007-07-13 B. S. Shajee Mohan , V. K. Govindan

Adaptive Transform Coding for Semantic Compression

Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for…

Image and Video Processing · Electrical Eng. & Systems 2026-04-30 Andriy Enttsel , Vincent Corlay

Data Compression with Stochastic Codes

Machine learning has had a major impact on data compression over the last decade and inspired many new, exciting theoretical and applied questions. This paper describes one such direction -- relative entropy coding -- which focuses on…

Information Theory · Computer Science 2026-02-10 Gergely Flamich , Deniz Gündüz

Text Compression-aided Transformer Encoding

Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant…

Computation and Language · Computer Science 2021-02-12 Zuchao Li , Zhuosheng Zhang , Hai Zhao , Rui Wang , Kehai Chen , Masao Utiyama , Eiichiro Sumita

Critical Data Compression

A new approach to data compression is developed and applied to multimedia content. This method separates messages into components suitable for both lossless coding and 'lossy' or statistical coding techniques, compressing complex objects by…

Information Theory · Computer Science 2011-12-26 John Scoville

Deep Learning for Joint Source-Channel Coding of Text

We consider the problem of joint source and channel coding of structured data such as natural language over a noisy channel. The typical approach to this problem in both theory and practice involves performing source coding to first…

Information Theory · Computer Science 2018-02-21 Nariman Farsad , Milind Rao , Andrea Goldsmith

Prediction by Compression

It is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability…

Information Theory · Computer Science 2010-08-31 Joel Ratsaby

PivotCompress: Compression by Sorting

Sorted data is usually easier to compress than unsorted permutations of the same data. This motivates a simple compression scheme: specify the sorted permutation of the data along with a representation of the sorted data compressed…

Data Structures and Algorithms · Computer Science 2014-11-24 Oscar Stiffelman

An Efficient Technique for Text Compression

For storing a word or the whole text segment, we need a huge storage space. Typically a character requires 1 Byte for storing it in memory. Compression of the memory is very important for data management. In case of memory requirement…

Information Theory · Computer Science 2010-09-28 Md. Abul Kalam Azad , Rezwana Sharmeen , Shabbir Ahmad , S. M. Kamruzzaman

Deep Lossless Image Compression via Masked Sampling and Coarse-to-Fine Auto-Regression

Learning-based lossless image compression employs pixel-based or subimage-based auto-regression for probability estimation, which achieves desirable performances. However, the existing works only consider context dependencies in one…

Image and Video Processing · Electrical Eng. & Systems 2025-03-17 Tiantian Li , Qunbing Xia , Yue Li , Ruixiao Guo , Gaobo Yang

A Symbolic Dynamical System Approach to Lossy Source Coding with Feedforward

It is known that modeling an information source via a symbolic dynamical system evolving over the unit interval, leads to a natural lossless compression scheme attaining the entropy rate of the source, under general conditions. We extend…

Information Theory · Computer Science 2010-01-21 Ofer Shayevitz

Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction

Traditional lossless text compression preserves every byte, but its gains on natural language are often modest in realistic operating regimes. We study \emph{lossy semantic text compression}, where the encoder strategically deletes parts of…

Computation and Language · Computer Science 2026-05-29 Yuchun Zou , Junhong Tong , Jun Li

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han