Related papers: Pcodec: Better Compression for Numerical Sequences
Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of…
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…
Can we use machine learning to compress graph data? The absence of ordering in graphs poses a significant challenge to conventional compression algorithms, limiting their attainable gains as well as their ability to discover relevant…
Existing remote sensing image compression methods still explore to balance high compression efficiency with the preservation of fine details and task-relevant information. Meanwhile, high-resolution drone imagery offers valuable structural…
The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the…
Data encoding is a fundamental step in emerging computing paradigms, particularly in stochastic computing (SC) and hyperdimensional computing (HDC), where it plays a crucial role in determining the overall system performance and hardware…
Many real-world datasets are represented as tensors, i.e., multi-dimensional arrays of numerical values. Storing them without compression often requires substantial space, which grows exponentially with the order. While many tensor…
This paper describes a new method of data encoding which may be used in various modern digital, computer and telecommunication systems and devices. The method permits the compression of data for storage or transmission, allowing the exact…
Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…
We propose computationally efficient encoders and decoders for lossy compression using a Sparse Regression Code. The codebook is defined by a design matrix and codewords are structured linear combinations of columns of this matrix. The…
We present an optimal method for encoding cluster assignments of arbitrary data sets. Our method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment information as cycles of the permutation defined by the order of…
Probabilistic computing is an emerging quantum-inspired computing paradigm capable of solving combinatorial optimization and various other classes of computationally hard problems. In this work, we present pc-COP, an efficient and…
We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the $L_0$ pseudo-norm, we use the p-adic norm. We require that the hierarchy or tree be…
Controlling Large Language Models (LLMs) to prevent the generation of undesirable content, such as profanity and personally identifiable information (PII), has become increasingly critical. While earlier approaches relied on post-processing…
The rapid growth of digital data has heightened the demand for efficient lossless compression methods. However, existing algorithms exhibit trade-offs: some achieve high compression ratios, others excel in encoding or decoding speed, and…
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution,…
We introduce a protocol called ENCORE which simultaneously compresses and encrypts data in a one-pass process that can be implemented efficiently and possesses a number of desirable features as a streaming encoder/decoder. Motivated by the…
We propose an optimal MMSE precoding technique using quantized signals with constant envelope. Unlike the existing MMSE design that relies on 1-bit resolution, the proposed approach employs uniform phase quantization and the bounding step…
Generative model based image lossless compression algorithms have seen a great success in improving compression ratio. However, the throughput for most of them is less than 1 MB/s even with the most advanced AI accelerated chips, preventing…
Handling communication overhead in large-scale tensor-parallel training remains a critical challenge due to the dense, near-zero distributions of intermediate tensors, which exacerbate errors under frequent communication and introduce…