Related papers: Pcodec: Better Compression for Numerical Sequences

Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation

Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of…

Statistics Theory · Mathematics 2019-02-05 Suzanne Varet , Claire Lacour , Pascal Massart , Vincent Rivoirard

Decoding billions of integers per second through vectorization

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

Partition and Code: learning how to compress graphs

Can we use machine learning to compress graph data? The absence of ordering in graphs poses a significant challenge to conventional compression algorithms, limiting their attainable gains as well as their ability to discover relevant…

Machine Learning · Computer Science 2023-09-26 Giorgos Bouritsas , Andreas Loukas , Nikolaos Karalias , Michael M. Bronstein

A PPO-Based Bitrate Allocation Conditional Diffusion Model for Remote Sensing Image Compression

Existing remote sensing image compression methods still explore to balance high compression efficiency with the preservation of fine details and task-relevant information. Meanwhile, high-resolution drone imagery offers valuable structural…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Yuming Han , Jooho Kim , Anish Shakya

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the…

Machine Learning · Computer Science 2021-03-19 Ahmed M. Abdelmoniem , Ahmed Elzanaty , Mohamed-Slim Alouini , Marco Canini

Improved Data Encoding for Emerging Computing Paradigms: From Stochastic to Hyperdimensional Computing

Data encoding is a fundamental step in emerging computing paradigms, particularly in stochastic computing (SC) and hyperdimensional computing (HDC), where it plays a crucial role in determining the overall system performance and hardware…

Emerging Technologies · Computer Science 2025-01-07 Mehran Shoushtari Moghadam , Sercan Aygun , M. Hassan Najafi

TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

Many real-world datasets are represented as tensors, i.e., multi-dimensional arrays of numerical values. Storing them without compression often requires substantial space, which grows exponentially with the order. While many tensor…

Machine Learning · Computer Science 2023-09-21 Taehyung Kwon , Jihoon Ko , Jinhong Jung , Kijung Shin

BIN@ERN: Binary-Ternary Compressing Data Coding

This paper describes a new method of data encoding which may be used in various modern digital, computer and telecommunication systems and devices. The method permits the compression of data for storage or transmission, allowing the exact…

Information Theory · Computer Science 2012-01-27 Igor Nesiolovskiy , Artem Nesiolovskiy

A General SIMD-based Approach to Accelerating Compression Algorithms

Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…

Information Retrieval · Computer Science 2015-04-15 Wayne Xin Zhao , Xudong Zhang , Daniel Lemire , Dongdong Shan , Jian-Yun Nie , Hongfei Yan , Ji-Rong Wen

Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding

We propose computationally efficient encoders and decoders for lossy compression using a Sparse Regression Code. The codebook is defined by a design matrix and codewords are structured linear combinations of columns of this matrix. The…

Information Theory · Computer Science 2014-05-20 Ramji Venkataramanan , Tuhin Sarkar , Sekhar Tatikonda

Random Cycle Coding: Lossless Compression of Cluster Assignments via Bits-Back Coding

We present an optimal method for encoding cluster assignments of arbitrary data sets. Our method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment information as cycles of the permutation defined by the order of…

Machine Learning · Computer Science 2024-12-03 Daniel Severo , Ashish Khisti , Alireza Makhzani

pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization

Probabilistic computing is an emerging quantum-inspired computing paradigm capable of solving combinatorial optimization and various other classes of computationally hard problems. In this work, we present pc-COP, an efficient and…

Emerging Technologies · Computer Science 2025-04-08 Kiran Magar , Shreya Bharathan , Utsav Banerjee

Sparse p-Adic Data Coding for Computationally Efficient and Effective Big Data Analytics

We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the $L_0$ pseudo-norm, we use the p-adic norm. We require that the hierarchy or tree be…

Information Theory · Computer Science 2018-04-10 Fionn Murtagh

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

Controlling Large Language Models (LLMs) to prevent the generation of undesirable content, such as profanity and personally identifiable information (PII), has become increasingly critical. While earlier approaches relied on post-processing…

Computation and Language · Computer Science 2026-05-12 Hyundong Jin , Yo-Sub Han

Challenges and Solutions in Selecting Optimal Lossless Data Compression Algorithms

The rapid growth of digital data has heightened the demand for efficient lossless compression methods. However, existing algorithms exhibit trade-offs: some achieve high compression ratios, others excel in encoding or decoding speed, and…

Information Theory · Computer Science 2025-10-01 Md. Atiqur Rahman , MM Fazle Rabbi

Reliable Detection of Compressed and Encrypted Data

Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution,…

Cryptography and Security · Computer Science 2021-04-01 Fabio De Gaspari , Dorjan Hitaj , Giulio Pagnotta , Lorenzo De Carli , Luigi V. Mancini

Cryptographic Compression

We introduce a protocol called ENCORE which simultaneously compresses and encrypts data in a one-pass process that can be implemented efficiently and possesses a number of desirable features as a streaming encoder/decoder. Motivated by the…

Cryptography and Security · Computer Science 2025-01-28 Joshua Cooper , Grant Fickes

Discrete MMSE Precoding for Multiuser MIMO Systems with PSK Modulation

We propose an optimal MMSE precoding technique using quantized signals with constant envelope. Unlike the existing MMSE design that relies on 1-bit resolution, the proposed approach employs uniform phase quantization and the bounding step…

Information Theory · Computer Science 2021-05-26 Erico S. P. Lopes , Lukas T. N. Landau

PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework

Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing…

Image and Video Processing · Electrical Eng. & Systems 2022-06-14 Ning Kang , Shanzhao Qiu , Shifeng Zhang , Zhenguo Li , Shutao Xia

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

Handling communication overhead in large-scale tensor-parallel training remains a critical challenge due to the dense, near-zero distributions of intermediate tensors, which exacerbate errors under frequent communication and introduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-28 Man Liu , Xingchen Liu , Xingjian Tian , Bing Lu , Shengkay Lyu , Shengquan Yin , Wenjing Huang , Zheng Wei , Hairui Zhao , Guangming Tan , Dingwen Tao