Related papers: Coding for Distributed Multi-Agent Reinforcement L…
Most multi-agent reinforcement learning (MARL) methods are limited in the scale of problems they can handle. With increasing numbers of agents, the number of training iterations required to find the optimal behaviors increases exponentially…
Multi-agent reinforcement learning for incomplete information environments has attracted extensive attention from researchers. However, due to the slow sample collection and poor sample exploration, there are still some problems in…
Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional…
In distributed machine learning (DML), the training data is distributed across multiple worker nodes to perform the underlying training in parallel. One major problem affecting the performance of DML algorithms is presence of stragglers.…
We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such…
In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data…
Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the…
Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems there are several types of noise that can affect the performance of distributed machine learning algorithms -- straggler nodes,…
Coded distributed computing framework enables large-scale machine learning (ML) models to be trained efficiently in a distributed manner, while mitigating the straggler effect. In this work, we consider a multi-task assignment problem in a…
Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space,…
Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for…
Optimization in distributed networks plays a central role in almost all distributed machine learning problems. In principle, the use of distributed task allocation has reduced the computational time, allowing better response rates and…
Achieving distributed reinforcement learning (RL) for large-scale cooperative multi-agent systems (MASs) is challenging because: (i) each agent has access to only limited information; (ii) issues on convergence or computational complexity…
This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due…
Multi-Agent Reinforcement Learning (MARL) algorithms show amazing performance in simulation in recent years, but placing MARL in real-world applications may suffer safety problems. MARL with centralized shields was proposed and verified in…
Multi-agent pursuit-evasion tasks involving intelligent targets are notoriously challenging coordination problems. In this paper, we investigate new ways to learn such coordinated behaviors of unmanned aerial vehicles (UAVs) aimed at…
The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The…
We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a…
Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. In this review article, we have focused on presenting recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. In…
Today, various machine learning (ML) applications offer continuous data processing and real-time data analytics at the edge of a wireless network. Distributed real-time ML solutions are highly sensitive to the so-called straggler effect…