Related papers: A Generic Online Parallel Learning Framework for L…

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to…

Machine Learning · Computer Science 2023-05-23 Min Gan , Xiang-xiang Su , Guang-yong Chen , Jing Chen

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Gear Training: A new way to implement high-performance model-parallel training

The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all…

Machine Learning · Computer Science 2018-06-12 Hao Dong , Shuai Li , Dongchang Xu , Yi Ren , Di Zhang

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Similarity/Distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of metric learning field. Many metric learning algorithms learn a global distance function…

Machine Learning · Computer Science 2022-01-04 Baida Hamdan , Davood Zabihzadeh , Monsefi Reza

Kernel machines that adapt to GPUs for effective large batch training

Modern machine learning models are typically trained using Stochastic Gradient Descent (SGD) on massively parallel computing resources such as GPUs. Increasing mini-batch size is a simple and direct way to utilize the parallel computing…

Machine Learning · Statistics 2019-03-05 Siyuan Ma , Mikhail Belkin

Hybrid Approach to Parallel Stochastic Gradient Descent

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

Parallelizing non-linear sequential models over the sequence length

Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought…

Machine Learning · Computer Science 2024-01-17 Yi Heng Lim , Qi Zhu , Joshua Selfridge , Muhammad Firmansyah Kasim

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences

Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task, as the training procedure of GRU is inherently sequential. Prior efforts to parallelize GRU have largely focused on conventional parallelization strategies such as…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Gordon Euhyun Moon , Eric C. Cyr

When Meta-Learning Meets Online and Continual Learning: A Survey

Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a…

Machine Learning · Computer Science 2024-11-11 Jaehyeon Son , Soochan Lee , Gunhee Kim

Parallelizing Over Artificial Neural Network Training Runs with Multigrid

Artificial neural networks are a popular and effective machine learning technique. Great progress has been made parallelizing the expensive training phase of an individual network, leading to highly specialized pieces of hardware, many…

Numerical Analysis · Computer Science 2017-10-03 Jacob B. Schroder

Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation

We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural…

Machine Learning · Computer Science 2023-08-17 Adam N. McCaughan , Bakhrom G. Oripov , Natesh Ganesh , Sae Woo Nam , Andrew Dienstfrey , Sonia M. Buckley

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU…

Computer Vision and Pattern Recognition · Computer Science 2013-12-24 Thomas Paine , Hailin Jin , Jianchao Yang , Zhe Lin , Thomas Huang

Scalable Optimal Margin Distribution Machine

Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts.…

Machine Learning · Computer Science 2023-06-13 Yilin Wang , Nan Cao , Teng Zhang , Xuanhua Shi , Hai Jin

Distributed Training Large-Scale Deep Architectures

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-21 Shang-Xuan Zou , Chun-Yen Chen , Jui-Lin Wu , Chun-Nan Chou , Chia-Chin Tsao , Kuan-Chieh Tung , Ting-Wei Lin , Cheng-Lung Sung , Edward Y. Chang

A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to…

Machine Learning · Computer Science 2018-10-23 Rui Zhu , Di Niu

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-25 Haowei Yang , Yu Tian , Zhongheng Yang , Zhao Wang , Chengrui Zhou , Dannier Li

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

Two-dimensional Sparse Parallelism for Large Scale Deep Learning Recommendation Model Training

The increasing complexity of deep learning recommendation models (DLRM) has led to a growing need for large-scale distributed systems that can efficiently train vast amounts of data. In DLRM, the sparse embedding table is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-07 Xin Zhang , Quanyu Zhu , Liangbei Xu , Zain Huda , Wang Zhou , Jin Fang , Dennis van der Staay , Yuxi Hu , Jade Nie , Jiyan Yang , Chunzhi Yang