English
Related papers

Related papers: PyTorch Distributed: Experiences on Accelerating D…

200 papers

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the…

In spite of showing unreasonable effectiveness in modalities like Text and Image, Deep Learning has always lagged Gradient Boosting in tabular data - both in popularity and performance. But recently there have been newer models created…

Machine Learning · Computer Science 2021-04-29 Manu Joseph

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

Distributed deep learning is becoming increasingly popular due to the expanding demand for computing resources for deep learning models with a larger amount of parameters. Different from traditional training approaches, data-parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Hao Bai

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style…

An appropriate choice of batch sizes in large-scale model training is crucial, yet it involves an intrinsic yet inevitable dilemma: large-batch training improves training efficiency in terms of memory utilization, while generalization…

Machine Learning · Computer Science 2025-03-18 Tim Tsz-Kit Lau , Weijian Li , Chenwei Xu , Han Liu , Mladen Kolar

With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization…

Machine Learning · Computer Science 2023-12-07 Matthew Choi , Muhammad Adil Asif , John Willes , David Emerson

Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have…

Machine Learning · Computer Science 2024-11-11 Zhihong Liu , Xin Xu , Peng Qiao , Dongsheng Li

We introduce PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it…

Machine Learning · Computer Science 2019-04-26 Matthias Fey , Jan Eric Lenssen

We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a…

Machine Learning · Computer Science 2024-12-17 Weihua Hu , Yiwen Yuan , Zecheng Zhang , Akihiro Nitta , Kaidi Cao , Vid Kocijan , Jinu Sunil , Jure Leskovec , Matthias Fey

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Deep metric learning algorithms have a wide variety of applications, but implementing these algorithms can be tedious and time consuming. PyTorch Metric Learning is an open source library that aims to remove this barrier for both…

Computer Vision and Pattern Recognition · Computer Science 2020-08-24 Kevin Musgrave , Serge Belongie , Ser-Nam Lim

Deep learning (DL) has been a revolutionary technique in various domains. To facilitate the model development and deployment, many deep learning frameworks are proposed, among which PyTorch is one of the most popular solutions. The…

Machine Learning · Computer Science 2023-06-27 Yueming Hao , Xu Zhao , Bin Bao , David Berard , Will Constable , Adnan Aziz , Xu Liu

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared machine learning model while keeping training data locally on the device, thereby removing the need to store and access the full…

Machine Learning · Computer Science 2022-07-19 Konstantin Burlachenko , Samuel Horváth , Peter Richtárik

The AI hardware boom has led modern data centers to adopt HPC-style architectures centered on distributed, GPU-centric computation. Large GPU clusters interconnected by fast RDMA networks and backed by high-bandwidth NVMe storage enable…

Databases · Computer Science 2026-05-21 Jigao Luo , Nils Boeschen , Muhammad El-Hindi , Carsten Binnig

The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data processing pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms…

Machine Learning · Computer Science 2016-10-04 Hanjoo Kim , Jaehong Park , Jaehee Jang , Sungroh Yoon

Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product…

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Although recent scaling up approaches to training deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets, require deep learning…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Bita Hasheminezhad , Shahrzad Shirzad , Nanmiao Wu , Patrick Diehl , Hannes Schulz , Hartmut Kaiser

The $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and…

Machine Learning · Computer Science 2025-06-05 Tianyu Du , Ayush Kanodia , Susan Athey
‹ Prev 1 2 3 10 Next ›