Related papers: Coded Data Rebalancing for Decentralized Distribut…

Coded Data Rebalancing: Fundamental Limits and Constructions

Distributed databases often suffer unequal distribution of data among storage nodes, which is known as `data skew'. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the…

Information Theory · Computer Science 2020-07-14 Prasad Krishnan , V. Lalitha , Lakshmi Natarajan

Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage

We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed…

Information Theory · Computer Science 2024-12-13 Abhinav Vaishya , Athreya Chandramouli , Srikar Kale , Prasad Krishnan

Near Optimal Coded Data Shuffling for Distributed Learning

Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data…

Information Theory · Computer Science 2018-01-08 Mohamed A. Attia , Ravi Tandon

Controlling Data Access Load in Distributed Systems

Distributed systems store data objects redundantly to balance the data access load over multiple nodes. Load balancing performance depends mainly on 1) the level of storage redundancy and 2) the assignment of data objects to storage nodes.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-19 Mehmet Aktas , Emina Soljanin

Quorum Sensing for Regenerating Codes in Distributed Storage

Distributed storage systems with replication are well known for storing large amount of data. A large number of replication is done in order to provide reliability. This makes the system expensive. Various methods have been proposed over…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-01 Mit Sheth , Krishna Gopal Benerjee , Manish K. Gupta

Coded Caching with Distributed Storage

Content delivery networks store information distributed across multiple servers, so as to balance the load and avoid unrecoverable losses in case of node or disk failures. Coded caching has been shown to be a useful technique which can…

Information Theory · Computer Science 2016-11-22 Tianqiong Luo , Vaneet Aggarwal , Borja Peleato

Coded Load Balancing in Cache Networks

We consider load balancing problem in a cache network consisting of storage-enabled servers forming a distributed content delivery scenario. Previously proposed load balancing solutions cannot perfectly balance out requests among servers,…

Information Theory · Computer Science 2019-08-06 Mahdi Jafari Siavoshani , Farzad Parvaresh , Ali Pourmiri , Seyed Pooya Shariatpanahi

Erasure Coding for Distributed Storage: An Overview

In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node.…

Information Theory · Computer Science 2018-06-13 S. B. Balaji , M. Nikhil Krishnan , Myna Vajha , Vinayak Ramkumar , Birenjith Sasidharan , P. Vijay Kumar

Distributed Storage Allocations

We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store…

Information Theory · Computer Science 2016-11-15 Derek Leong , Alexandros G. Dimakis , Tracey Ho

A Fundamental Tradeoff Among Storage, Computation, and Communication for Distributed Computing over Star Network

Coded distributed computing can alleviate the communication load by leveraging the redundant storage and computation resources with coding techniques in distributed computing. In this paper, we study a MapReduce-type distributed computing…

Information Theory · Computer Science 2023-01-11 Qifa Yan , Xiaohu Tang , Meixia Tao , Qin Huang

On the Worst-case Communication Overhead for Distributed Data Shuffling

Distributed learning platforms for processing large scale data-sets are becoming increasingly prevalent. In typical distributed implementations, a centralized master node breaks the data-set into smaller batches for parallel processing…

Information Theory · Computer Science 2016-10-03 Mohamed Attia , Ravi Tandon

Fundamental Limits of Decentralized Data Shuffling

Data shuffling of training data among different computing nodes (workers) has been identified as a core element to improve the statistical performance of modern large-scale machine learning algorithms. Data shuffling is often considered as…

Information Theory · Computer Science 2020-01-15 Kai Wan , Daniela Tuninetti , Mingyue Ji , Giuseppe Caire , Pablo Piantanida

Data Replication for Reducing Computing Time in Distributed Systems with Stragglers

In distributed computing systems with stragglers, various forms of redundancy can improve the average delay performance. We study the optimal replication of data in systems where the job execution time is a stochastically decreasing and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Amir Behrouzi-Far , Emina Soljanin

Quality-of-Data for Consistency Levels in Geo-replicated Cloud Data Stores

Cloud computing has recently emerged as a key technology to provide individuals and companies with access to remote computing and storage infrastructures. In order to achieve highly-available yet high-performing services, cloud data stores…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-08-10 Álvaro García-Recuero , Sérgio Esteves , Luís Veiga

The Non-IID Data Quagmire of Decentralized Machine Learning

Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their…

Machine Learning · Computer Science 2020-08-20 Kevin Hsieh , Amar Phanishayee , Onur Mutlu , Phillip B. Gibbons

Decentralized Coded Caching Attains Order-Optimal Memory-Rate Tradeoff

Replicating or caching popular content in memories distributed across the network is a technique to reduce peak network loads. Conventionally, the main performance gain of this caching was thought to result from making part of the requested…

Information Theory · Computer Science 2015-09-08 Mohammad Ali Maddah-Ali , Urs Niesen

On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning

We consider the data shuffling problem in a distributed learning system, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access…

Information Theory · Computer Science 2020-06-24 Adel Elmahdy , Soheil Mohajer

On Coded Caching Systems with Decentralized Linear Coding Placement

Coded caching is a technique that leverages locally cached contents at the end users to reduce the network's peak-time communication load. Coded caching has been shown to achieve significant performance gains with a centralized placement…

Information Theory · Computer Science 2026-05-01 Yinbin Ma , Daniela Tuninetti

Distributed source coding in dense sensor networks

We study the problem of the reconstruction of a Gaussian field defined in [0,1] using N sensors deployed at regular intervals. The goal is to quantify the total data rate required for the reconstruction of the field with a given mean square…

Information Theory · Computer Science 2007-10-23 Akshay Kashyap , Luis Alfonso Lastras-Montaño , Cathy Xia , Zhen Liu

Decentralized Coding Algorithms for Distributed Storage in Wireless Sensor Networks

We consider large-scale wireless sensor networks with $n$ nodes, out of which k are in possession, (e.g., have sensed or collected in some other way) k information packets. In the scenarios in which network nodes are vulnerable because of,…

Information Theory · Computer Science 2009-10-27 Zhenning Kong , Salah A. Aly , Emina Soljanin