Related papers: JanusPipe: Efficient Pipeline Parallel Training fo…

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Xuan Peng , Xuanhua Shi , Haolin Zhang , Yunfei Zhao , Xuehai Qian

Automated Machine Learning Pipeline: Large Language Models-Assisted Automated Dataset Generation for Training Machine-Learned Interatomic Potentials

Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs…

Materials Science · Physics 2025-12-30 Adam Lahouari , Jutta Rogal , Mark E. Tuckerman

Scaling Deep Learning Training with MPMD Pipeline Parallelism

We present JaxPP, a system for efficiently scaling the training of large deep learning models with flexible pipeline parallelism. We introduce a seamless programming model that allows implementing user-defined pipeline schedules for…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-20 Anxhelo Xhebraj , Sean Lee , Hanfeng Chen , Vinod Grover

SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training

Pipeline Parallelism (PP) serves as a crucial technique for training Large Language Models (LLMs), owing to its capability to alleviate memory pressure from model states with relatively low communication overhead. However, in long-context…

Machine Learning · Computer Science 2025-04-22 Zhouyang Li , Yuliang Liu , Wei Zhang , Tailing Yuan , Bin Chen , Chengru Song , Di Zhang

BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training

With the increasing scale of models, the need for efficient distributed training has become increasingly urgent. Recently, many synchronous pipeline parallelism approaches have been proposed to improve training throughput. However, these…

Machine Learning · Computer Science 2024-10-28 Houming Wu , Ling Chen , Wenjie Yu

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine learning algorithm increases. To satisfy the requirement of computation and memory of DNN training, distributed deep learning based on model parallelism…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-15 Letian Zhao , Rui Xu , Tianqi Wang , Teng Tian , Xiaotian Wang , Wei Wu , Chio-in Ieong , Xi Jin

Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism

Larger model sizes and longer sequence lengths have empowered the Large Language Model (LLM) to achieve outstanding performance across various domains. However, this progress brings significant storage capacity challenges for LLM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Xinyuan Lin , Chenlu Li , Zongle Huang , Chunyu Wang , Bo Xiao , Huazhong Yang , Shishi Duan , Yongpan Liu

Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems

Machine Learning Interatomic Potentials (MLIP) are a novel in silico approach for molecular property prediction, creating an alternative to disrupt the accuracy/speed trade-off of empirical force fields and density functional theory (DFT).…

Chemical Physics · Physics 2025-08-18 Christoph Brunken , Olivier Peltre , Heloise Chomet , Lucien Walewski , Manus McAuliffe , Valentin Heyraud , Solal Attias , Martin Maarand , Yessine Khanfir , Edan Toledo , Fabio Falcioni , Marie Bluntzer , Silvia Acosta-Gutiérrez , Jules Tilly

Learning Interatomic Potentials at Multiple Scales

The need to use a short time step is a key limit on the speed of molecular dynamics (MD) simulations. Simulations governed by classical potentials are often accelerated by using a multiple-time-step (MTS) integrator that evaluates certain…

Chemical Physics · Physics 2023-10-24 Xiang Fu , Albert Musaelian , Anders Johansson , Tommi Jaakkola , Boris Kozinsky

AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models

Pipeline parallelism is widely used to train large language models (LLMs). However, increasing heterogeneity in model architectures exacerbates pipeline bubbles, thereby reducing training efficiency. Existing approaches overlook the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Jihu Guo , Tenghui Ma , Wei Gao , Peng Sun , Jiaxing Li , Xun Chen , Yuyang Jin , Dahua Lin

Iterative Pretraining Framework for Interatomic Potentials

Machine learning interatomic potentials (MLIPs) enable efficient molecular dynamics (MD) simulations with ab initio accuracy and have been applied across various domains in physical science. However, their performance often relies on…

Computational Physics · Physics 2025-07-29 Taoyong Cui , Zhongyao Wang , Dongzhan Zhou , Yuqiang Li , Lei Bai , Wanli Ouyang , Mao Su , Shufei Zhang

Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

Machine learned interatomic potentials (MLIPs) have emerged as powerful tools for molecular dynamics (MD) simulations with their competitive accuracy and computational efficiency. However, MLIPs are often observed to exhibit un-physical…

Materials Science · Physics 2026-02-24 Qianyu Zheng , Victor Fung

DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials

Large-scale atomistic simulations are essential to bridge computational materials and chemistry to realistic materials and drug discovery applications. In the past few years, rapid developments of machine learning interatomic potentials…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-03 Kevin Han , Bowen Deng , Amir Barati Farimani , Gerbrand Ceder

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin

Geometry-enhanced Pre-training on Interatomic Potentials

Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are…

Chemical Physics · Physics 2024-04-16 Taoyong Cui , Chenyu Tang , Mao Su , Shufei Zhang , Yuqiang Li , Lei Bai , Yuhan Dong , Xingao Gong , Wanli Ouyang

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism…

Machine Learning · Computer Science 2021-09-29 Zhuohan Li , Siyuan Zhuang , Shiyuan Guo , Danyang Zhuo , Hao Zhang , Dawn Song , Ion Stoica

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments…

Systems and Control · Electrical Eng. & Systems 2025-09-25 Jiewei Chen , Xiumei Deng , Zehui Xiong , Shaoyong Guo , Xuesong Qiu , Ping Wang , Dusit Niyato

Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Universal Machine Learning Interatomic Potentials (uMLIPs), pre-trained on massively diverse datasets encompassing inorganic materials and organic molecules across the entire periodic table, serve as foundational models for quantum-accurate…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-20 Yuanchang Zhou , Hongyu Wang , Yiming Du , Yan Wang , Mingzhen Li , Siyu Hu , Xiangyu Zhang , Weijian Liu , Chen Wang , Zhuoqiang Guo , Long Wang , Jingde Bu , Yutong Lu , Guangming Tan , Weile Jia

Online Test-time Adaptation for Interatomic Potentials

Machine learning interatomic potentials (MLIPs) enable more efficient molecular dynamics (MD) simulations with ab initio accuracy, which have been used in various domains of physical science. However, distribution shift between training and…

Computational Physics · Physics 2024-05-15 Taoyong Cui , Chenyu Tang , Dongzhan Zhou , Yuqiang Li , Xingao Gong , Wanli Ouyang , Mao Su , Shufei Zhang

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference

As inference workloads for large language models (LLMs) scale to meet growing user demand, pipeline parallelism (PP) has become a widely adopted strategy for multi-GPU deployment, particularly in cross-node setups, to improve key-value (KV)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-30 Yongchao He , Bohan Zhao , Zheng Cao