Related papers: Communication-Efficient Distributed Deep Learning:…
With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the…
Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling problems. In this article, we…
The past few years have witnessed the flourishing of large-scale deep neural network models with ever-growing parameter numbers. Training such large-scale models typically requires massive memory and computing resources, necessitating…
The ever-growing volume and decentralized nature of data, coupled with the need to harness it and extract knowledge, have led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local…
Recent trends in high-performance computing and deep learning have led to the proliferation of studies on large-scale deep neural network training. However, the frequent communication requirements among computation nodes drastically slows…
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains such as image recognition and natural language processing. One of the reasons for this success is the increasing size…
Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction. Much of this progress has been…
Deep learning (DL) has transformed applications in a variety of domains, including computer vision, natural language processing, and tabular data analysis. The search for improved DL model accuracy has led practitioners to explore…
The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…
Distributed edge learning (DL) is considered a cornerstone of intelligence enablers, since it allows for collaborative training without the necessity for local clients to share raw data with other parties, thereby preserving privacy and…
Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…
Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we…
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine. This model has received substantial interest in…
In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and…
Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years. Despite continuous efforts, prior solutions such as gradient compression/reduction, compute/communication…
With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep…
This thesis is concerned with the design of distributed algorithms for solving optimization problems. We consider networks where each node has exclusive access to a cost function, and design algorithms that make all nodes cooperate to find…
Wider coverage and a better solution to a latency reduction in 5G necessitate its combination with multi-access edge computing (MEC) technology. Decentralized deep learning (DDL) such as federated learning and swarm learning as a promising…
Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image…