Related papers: Malleable Coding: Compressed Palimpsests
In cloud computing, storage area networks, remote backup storage, and similar settings, stored data is modified with updates from new versions. Representing information and modifying the representation are both expensive. Therefore it is…
Describes a near-linear-time algorithm for a variant of Huffman coding, in which the letters may have non-uniform lengths (as in Morse code), but with the restriction that each word to be encoded has equal probability. [See also ``Huffman…
The explosion of the amount of data stored in cloud systems calls for more efficient paradigms for redundancy. While replication is widely used to ensure data availability, erasure correcting codes provide a much better trade-off between…
The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present…
We study the problem of designing systems in order to minimize cost while meeting a given flexibility target. Flexibility is attained by enforcing a joint chance constraint, which ensures that the system will exhibit feasible operation with…
This work considers the problem of transmitting multiple compressible sources over a network at minimum cost. The aim is to find the optimal rates at which the sources should be compressed and the network flows using which they should be…
Distributed storage systems must handle both data heterogeneity, arising from non-uniform access demands, and device heterogeneity, caused by time-varying node reliability. In this paper, we study convertible codes, which enable the…
This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to…
In large-scale distributed storage systems, erasure codes are used to achieve fault tolerance in the face of node failures. Tuning code parameters to observed failure rates has been shown to significantly reduce storage cost. Such tuning of…
This paper presents prefix codes which minimize various criteria constructed as a convex combination of maximum codeword length and average codeword length or maximum redundancy and average redundancy, including a convex combination of the…
We consider large-scale linear inverse problems in Bayesian settings. We follow a recent line of work that applies the approximate message passing (AMP) framework to multi-processor (MP) computational systems, where each processor node…
Adaptive coding faces the following problem: given a collection of source classes such that each class in the collection has non-trivial minimax redundancy rate, can we design a single code which is asymptotically minimax over each class in…
Distributed storage systems for large-scale applications typically use replication for reliability. Recently, erasure codes were used to reduce the large storage overhead, while increasing data reliability. A main limitation of…
The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we…
Cloud providers have recently introduced new offerings whereby spare computing resources are accessible at discounts compared to on-demand computing. Exploiting such opportunity is challenging inasmuch as such resources are accessed with…
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can…
This paper investigates the problem of variable-length lossy source coding allowing a positive excess distortion probability and an overflow probability of codeword lengths. Novel one-shot achievability and converse bounds of the optimal…
The problem of variable-rate lossless data compression is considered, for codes with and without prefix constraints. Sharp bounds are derived for the best achievable compression rate of memoryless sources, when the excess-rate probability…
The weighted-Hamming metric generalizes the Hamming metric by assigning different weights to blocks of coordinates. It is well-suited for applications such as coding over independent parallel channels, each of which has a different level of…
We investigate the fundamental task of addition under uncertainty, namely, addends that are represented as intervals of numbers rather than single values. One potential source of such uncertainty can occur when obtaining discrete-valued…