Related papers: Does compressing activations help model parallel t…

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Scaling models has led to significant advancements in deep learning, but training these models in decentralized settings remains challenging due to communication bottlenecks. While existing compression techniques are effective in…

Machine Learning · Computer Science 2025-06-03 Sameera Ramasinghe , Thalaiyasingam Ajanthan , Gil Avraham , Yan Zuo , Alexander Long

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting,…

Computation and Language · Computer Science 2020-06-24 Zhuohan Li , Eric Wallace , Sheng Shen , Kevin Lin , Kurt Keutzer , Dan Klein , Joseph E. Gonzalez

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-25 Haowei Yang , Yu Tian , Zhongheng Yang , Zhao Wang , Chengrui Zhou , Dannier Li

Model Compression and Efficient Inference for Large Language Models: A Survey

Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained…

Computation and Language · Computer Science 2024-02-16 Wenxiao Wang , Wei Chen , Yicong Luo , Yongliu Long , Zhengkai Lin , Liye Zhang , Binbin Lin , Deng Cai , Xiaofei He

Model Compression

With time, machine learning models have increased in their scope, functionality and size. Consequently, the increased functionality and size of such models requires high-end hardware to both train and provide inference after the fact. This…

Machine Learning · Computer Science 2021-09-07 Arhum Ishtiaq , Sara Mahmood , Maheen Anees , Neha Mumtaz

Activations and Gradients Compression for Model-Parallel Training

Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information…

Machine Learning · Computer Science 2024-03-27 Mikhail Rudakov , Aleksandr Beznosikov , Yaroslav Kholodov , Alexander Gasnikov

A Survey on Transformer Compression

Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV), specially for constructing large language models (LLM) and large vision models (LVM). Model compression methods reduce the memory…

Machine Learning · Computer Science 2024-04-09 Yehui Tang , Yunhe Wang , Jianyuan Guo , Zhijun Tu , Kai Han , Hailin Hu , Dacheng Tao

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-30 Max Ryabinin , Tim Dettmers , Michael Diskin , Alexander Borzunov

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Deep learning models have achieved tremendous success in most of the industries in recent years. The evolution of these models has also led to an increase in the model size and energy requirement, making it difficult to deploy in production…

Machine Learning · Computer Science 2024-07-24 Aayush Saxena , Arit Kumar Bishwas , Ayush Ashok Mishra , Ryan Armstrong

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model…

Machine Learning · Computer Science 2017-07-06 Miguel Á. Carreira-Perpiñán

Dive into Big Model Training

The increasing scale of model size and continuous improvement of performance herald the arrival of the Big Model era. In this report, we explore what and how the big model training works by diving into training objectives and training…

Machine Learning · Computer Science 2022-07-26 Qinghua Liu , Yuxiang Jiang

Communication Compression for Tensor Parallel LLM Inference

Large Language Models (LLMs) have pushed the frontier of artificial intelligence but are comprised of hundreds of billions of parameters and operations. For faster inference latency, LLMs are deployed on multiple hardware accelerators…

Machine Learning · Computer Science 2026-01-07 Jan Hansen-Palmus , Michael Truong Le , Oliver Hausdörfer , Alok Verma

Model Compression Techniques in Biometrics Applications: A Survey

The development of deep learning algorithms has extensively empowered humanity's task automatization capacity. However, the huge improvement in the performance of these models is highly correlated with their increasing level of complexity,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-19 Eduarda Caldeira , Pedro C. Neto , Marco Huber , Naser Damer , Ana F. Sequeira

Homomorphic Parameter Compression for Distributed Deep Learning Training

Distributed training of deep neural networks has received significant research interest, and its major approaches include implementations on multiple GPUs and clusters. Parallelization can dramatically improve the efficiency of training…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-29 Jaehee Jang , Byungook Na , Sungroh Yoon

To prune, or not to prune: exploring the efficacy of pruning for model compression

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Large language models (LLMs) exhibit excellent performance in various tasks. However, the memory requirements of LLMs present a great challenge when deploying on memory-limited devices, even for quantized LLMs. This paper introduces a…

Computation and Language · Computer Science 2025-02-24 Weilan Wang , Yu Mao , Dongdong Tang , Hongchao Du , Nan Guan , Chun Jason Xue

Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression

In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism to multiple GPUs. Such technique, however, suffers from a high overhead of inter-node communication.…

Machine Learning · Computer Science 2023-01-25 Jaeyong Song , Jinkyu Yim , Jaewon Jung , Hongsun Jang , Hyung-Jin Kim , Youngsok Kim , Jinho Lee

Towards a Better Theoretical Understanding of Independent Subnetwork Training

Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels,…

Machine Learning · Computer Science 2024-06-05 Egor Shulgin , Peter Richtárik