English
Related papers

Related papers: Parameter Database : Data-centric Synchronization …

200 papers

As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires…

Machine Learning · Computer Science 2014-10-31 Wei Dai , Abhimanu Kumar , Jinliang Wei , Qirong Ho , Garth Gibson , Eric P. Xing

In distributed ML applications, shared parameters are usually replicated among computing nodes to minimize network overhead. Therefore, proper consistency model must be carefully chosen to ensure algorithm's correctness and provide high…

Machine Learning · Statistics 2014-01-03 Jinliang Wei , Wei Dai , Abhimanu Kumar , Xun Zheng , Qirong Ho , Eric P. Xing

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Hanpeng Hu , Dan Wang , Chuan Wu

The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is…

Machine Learning · Computer Science 2020-01-07 Xing Zhao , Manos Papagelis , Aijun An , Bao Xin Chen , Junfeng Liu , Yonggang Hu

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the…

Machine Learning · Statistics 2017-05-23 Bikash Joshi , Franck Iutzeler , Massih-Reza Amini

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a…

Machine Learning · Computer Science 2022-02-08 Hao Chen , Yu Ye , Ming Xiao , Mikael Skoglund

Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each…

Machine Learning · Computer Science 2016-10-03 Inci M. Baytas , Ming Yan , Anil K. Jain , Jiayu Zhou

Scheduling problems are a fundamental class of combinatorial optimization problems that underpin operational efficiency in manufacturing, logistics, and service systems. While operations research has traditionally developed solver-centric…

Optimization and Control · Mathematics 2026-02-03 Anbang Liu , Shaochong Lin , Jingchuan Chen , Peng Wu , Zuojun Max Shen

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan R. K. Ports , Peter Richtárik

Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Akash Dhasade , Anne-Marie Kermarrec , Rafael Pires , Rishi Sharma , Milos Vujasinovic

Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks…

Machine Learning · Computer Science 2017-06-21 Sulin Liu , Sinno Jialin Pan , Qirong Ho

ASYNC is a framework that supports the implementation of asynchrony and history for optimization methods on distributed computing platforms. The popularity of asynchronous optimization methods has increased in distributed machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-24 Saeed Soori , Bugra Can , Mert Gurbuzbalaba , Maryam Mehri Dehnavi

Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both regimes coexist. This hybrid setting presents new…

Machine Learning · Computer Science 2025-12-01 Junyi Zhu , Ruicong Yao , Taha Ceritli , Savas Ozkan , Matthew B. Blaschko , Eunchung Noh , Jeongwon Min , Cho Jung Min , Mete Ozay

We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the…

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

Recently, the database management system (DBMS) community has witnessed the power of machine learning (ML) solutions for DBMS tasks. Despite their promising performance, these existing solutions can hardly be considered satisfactory. First,…

Databases · Computer Science 2021-11-29 Ziniu Wu , Pei Yu , Peilun Yang , Rong Zhu , Yuxing Han , Yaliang Li , Defu Lian , Kai Zeng , Jingren Zhou

As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly available to a wide public, allowing end-users to submit queries with their own data, and to efficiently retrieve results. With increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Daniela Loreti , Marco Lippi , Paolo Torroni
‹ Prev 1 2 3 10 Next ›