Related papers: A Universal Parallel Two-Pass MDL Context Tree Com…
We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-$N$ input sequence is partitioned into $B$ blocks. Processing each block independently of the other…
Existing distribution compression methods reduce the number of observations in a dataset by minimising the Maximum Mean Discrepancy (MMD) between original and compressed sets, but modern datasets are often large in both sample size and…
A new run length encoding algorithm for lossless data compression that exploits positional redundancy by representing data in a two-dimensional model of concentric circles is presented. This visual transform enables detection of runs (each…
We present a new universal source code for distributions of unlabeled binary and ordinal trees that achieves optimal compression to within lower order terms for all tree sources covered by existing universal codes. At the same time, it…
Many applications require data processing to be performed on individual pieces of data which are of finite sizes, e.g., files in cloud storage units and packets in data networks. However, traditional universal compression solutions would…
The block tree [Belazzougui et al., J. Comput. Syst. Sci. '21] is a compressed representation of a length-$n$ text that supports access, rank, and select queries while requiring only $O(z\log\frac{n}{z})$ words of space, where $z$ is the…
This paper focuses on reducing memory usage in enumerative model checking, while maintaining the multi-core scalability obtained in earlier work. We present a tree-based multi-core compression method, which works by leveraging sharing among…
$k$d-trees are widely used in parallel databases to support efficient neighborhood/similarity queries. Supporting parallel updates to $k$d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel,…
In this paper, the context dependence multilevel pattern matching(in short CDMPM) grammar transform is proposed; based on this grammar transform, the universal lossless data compression algorithm, CDMPM code is then developed. Moreover we…
In-context learning has established itself as an important learning paradigm for Large Language Models (LLMs). In this paper, we demonstrate that LLMs can learn encoding keys in-context and perform analysis directly on encoded…
Today's exponentially increasing data volumes and the high cost of storage make compression essential for the Big Data industry. Although research has concentrated on efficient compression, fast decompression is critical for analytics…
Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become…
In this paper we present and evaluate a parallel algorithm for solving a minimum spanning tree (MST) problem for supercomputers with distributed memory. The algorithm relies on the relaxation of the message processing order requirement for…
Compression is beneficial because it helps detract resource usage. It reduces data storage space as well as transmission traffic and improves web pages loading. Run-length coding (RLC) is a lossless data compression algorithm. Data are…
Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at…
Recently, the existence of considerable amount of redundancy in the Internet traffic has stimulated the deployment of several redundancy elimination techniques within the network. These techniques are often based on either packet-level…
The problem of the universal compression of a sequence from a library of several small to moderate length sequences from similar context arises in many practical scenarios, such as the compression of the storage data and the Internet…
In this paper, we propose {\em distributed network compression via memory}. We consider two spatially separated sources with correlated unknown source parameters. We wish to study the universal compression of a sequence of length $n$ from…
We propose BS-tree, an in-memory implementation of the B+-tree that adopts the structure of the disk-based index (i.e., a balanced, multiway tree), setting the node size to a memory block that can be processed fast and in parallel using…
CP tensor decomposition with alternating least squares (ALS) is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel that is necessary to set up the quadratic optimization subproblems. State-of-art parallel…