English
Related papers

Related papers: Consistent Bounded-Asynchronous Parameter Servers …

200 papers

As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires…

Machine Learning · Computer Science 2014-10-31 Wei Dai , Abhimanu Kumar , Jinliang Wei , Qirong Ho , Garth Gibson , Eric P. Xing

The replication mechanism resolves some challenges with big data such as data durability, data access, and fault tolerance. Yet, replication itself gives birth to another challenge known as the consistency in distributed systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-12 Hesam Nejati Sharif Aldin , Hossein Deldari , Mohammad Hossein Moattar , Mostafa Razavi Ghods

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Hanpeng Hu , Dan Wang , Chuan Wu

We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk…

Databases · Computer Science 2015-08-06 Naman Goel , Divyakant Agrawal , Sanjay Chawla , Ahmed Elmagarmid

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the…

Machine Learning · Statistics 2017-05-23 Bikash Joshi , Franck Iutzeler , Massih-Reza Amini

Fault-tolerant distributed algorithms are central for building reliable spatially distributed systems. Unfortunately, the lack of a canonical precise framework for fault-tolerant algorithms is an obstacle for both verification and…

Formal Languages and Automata Theory · Computer Science 2012-10-16 Annu John , Igor Konnov , Ulrich Schmid , Helmut Veith , Josef Widder

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes…

Machine Learning · Computer Science 2020-08-26 Shicong Cen , Huishuai Zhang , Yuejie Chi , Wei Chen , Tie-Yan Liu

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

The purpose of this study is to test the effectiveness of current straggler mitigation techniques over different important iterative convergent machine learning(ML) algorithm including Matrix Factorization (MF), Multinomial Logistic…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-31 Benjamin Wong

The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code,…

Operating Systems · Computer Science 2010-02-01 Amittai Aviram , Bryan Ford

Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest updates. Despite much development in…

Machine Learning · Computer Science 2018-10-09 Wei Dai , Yi Zhou , Nanqing Dong , Hao Zhang , Eric P. Xing

Latent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications. A large amount of effort has been made to develop systems that is able to extend…

Machine Learning · Computer Science 2015-11-19 Aaron Q. Li , Amr Ahmed , Mu Li , Vanja Josifovski

In this paper we consider a network of processors aiming at cooperatively solving linear programming problems subject to uncertainty. Each node only knows a common cost function and its local uncertain constraint set. We propose a…

Optimization and Control · Mathematics 2019-08-27 Mohammadreza Chamanbaz , Giuseppe Notarstefano , Roland Bouffanais

Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-17 Chen Yu , Hanlin Tang , Cedric Renggli , Simon Kassing , Ankit Singla , Dan Alistarh , Ce Zhang , Ji Liu

This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly…

Optimization and Control · Mathematics 2016-10-19 Arda Aytekin , Hamid Reza Feyzmahdavian , Mikael Johansson

The semantics of HPC storage systems are defined by the consistency models to which they abide. Storage consistency models have been less studied than their counterparts in memory systems, with the exception of the POSIX standard and its…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-03 Chen Wang , Kathryn Mohror , Marc Snir

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in…

Machine Learning · Computer Science 2020-07-06 Alexander Renz-Wieland , Rainer Gemulla , Steffen Zeuch , Volker Markl

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Recent research works on distributed adaptive networks have intensively studied the case where the nodes estimate a common parameter vector collaboratively. However, there are many applications that are multitask-oriented in the sense that…

Systems and Control · Computer Science 2013-11-04 Jie Chen , Cédric Richard , Ali Sayed
‹ Prev 1 2 3 10 Next ›