English
Related papers

Related papers: Hyper: Distributed Cloud Processing for Large-Scal…

200 papers

In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of…

Machine Learning · Computer Science 2023-04-28 Neelesh Mungoli

In this chapter we will argue that studying such multi-scale multi-science systems gives rise to inherently hybrid models containing many different algorithms best serviced by different types of computing environments (ranging from…

Astrophysics · Physics 2007-05-23 A. G. Hoekstra , S. F. Portegies Zwart , M. Bubak , P. M. A. Sloot

This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS)…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-11 R. Sobie , A. Agarwal , I. Gable , C. Leavett-Brown , M. Paterson , R. Taylor , A. Charbonneau , R. Impey , W. Podiama

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

Training large language models requires extensive processing, made possible by many high-performance computing resources. This study compares multi-node and multi-GPU environments for training large language models of electrocardiograms. It…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-28 Dimitar Mileski , Nikola Petrovski , Marjan Gusev

As the demand grows for scalable and privacy-aware AI systems, Federated Learning (FL) has emerged as a promising solution, allowing decentralized model training without moving raw data. At the same time, the combination of high-performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-26 Sangam Ghimire , Paribartan Timalsina , Nirjal Bhurtel , Bishal Neupane , Bigyan Byanju Shrestha , Subarna Bhattarai , Prajwal Gaire , Jessica Thapa , Sudan Jha

Cloud GPU servers have become the de facto way for deep learning practitioners to train complex models on large-scale datasets. However, it is challenging to determine the appropriate cluster configuration---e.g., server type and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 Shijian Li , Robert J. Walls , Tian Guo

Deep learning has been postulated as a solution for numerous problems in different branches of science. Given the resource-intensive nature of these models, they often need to be executed on specialized hardware such graphical processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-15 Jose González-Abad , Álvaro López García , Valentin Y. Kozlov

Distributed machine learning systems require strong privacy guarantees, verifiable compliance, and scalable deployment across heterogeneous and multi-cloud environments. This work introduces a cloud-native privacy-preserving architecture…

Distributed computing platforms provide a robust mechanism to perform large-scale computations by splitting the task and data among multiple locations, possibly located thousands of miles apart geographically. Although such distribution of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Alok Singh , Eric Stephan , Malachi Schram , Ilkay Altintas

High volume of data, perceived as either challenge or opportunity. Deep learning architecture demands high volume of data to effectively back propagate and train the weights without bias. At the same time, large volume of data demands…

Machine Learning · Statistics 2018-05-15 Kumarjit Pathak , Prabhukiran G , Jitin Kapila , Nikit Gawande

One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-01 Takuya Akiba , Keisuke Fukuda , Shuji Suzuki

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL)…

Machine Learning · Computer Science 2026-03-23 Yijiang Li , Zilinghan Li , Kyle Chard , Ian Foster , Todd Munson , Ravi Madduri , Kibaek Kim

Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML.…

Machine Learning · Computer Science 2011-07-06 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-21 Shaohuai Shi , Qiang Wang , Xiaowen Chu

This extended abstract explores the integration of federated learning with deep transfer hashing for distributed prediction tasks, emphasizing resource-efficient client training from evolving data streams. Federated learning allows multiple…

Machine Learning · Computer Science 2024-09-20 Manuel Röder , Frank-Michael Schleif
‹ Prev 1 2 3 10 Next ›