Related papers: Efficient Distributed MLLM Training with Cornstarc…

Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective

The rapid scaling of Large Language Models (LLMs) has pushed training workloads far beyond the limits of single-node analysis, demanding a deeper understanding of how these models behave across large-scale, multi-GPU systems. In this paper,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-22 Seokjin Go , Joongun Park , Spandan More , Hanjiang Wu , Irene Wang , Aaron Jezghani , Tushar Krishna , Divya Mahajan

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data…

Artificial Intelligence · Computer Science 2024-08-05 Jiaqi Wang , Hanqi Jiang , Yiheng Liu , Chong Ma , Xu Zhang , Yi Pan , Mengyuan Liu , Peiran Gu , Sichen Xia , Wenjun Li , Yutong Zhang , Zihao Wu , Zhengliang Liu , Tianyang Zhong , Bao Ge , Tuo Zhang , Ning Qiang , Xintao Hu , Xi Jiang , Xin Zhang , Wei Zhang , Dinggang Shen , Tianming Liu , Shu Zhang

DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism

Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yifan Niu , Han Xiao , Dongyi Liu , Wei Zhou , Jia Li

Enhancing Subtask Performance of Multi-modal Large Language Model

Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data. Current MLLMs typically begin by using LLMs to decompose tasks into…

Computation and Language · Computer Science 2023-09-01 Yongqiang Zhao , Zhenyu Li , Feng Zhang , Xinhai Xu , Donghong Liu

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

Multimodal Large Language Models (MLLMs) have achieved remarkable advances by integrating text, image, and audio understanding within a unified architecture. However, existing distributed training frameworks remain fundamentally data-blind:…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-20 Hyeonjun An , Sihyun Kim , Chaerim Lim , Hyunjoon Kim , Rathijit Sen , Sangmin Jung , Hyeonsoo Lee , Dongwook Kim , Takki Yu , Jinkyu Jeong , Youngsok Kim , Kwanghyun Park

Efficient Multimodal Large Language Models: A Survey

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Yizhang Jin , Jian Li , Yexin Liu , Tianjun Gu , Kai Wu , Zhengkai Jiang , Muyang He , Bo Zhao , Xin Tan , Zhenye Gan , Yabiao Wang , Chengjie Wang , Lizhuang Ma

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention…

Artificial Intelligence · Computer Science 2024-08-06 Zhen Qin , Daoyuan Chen , Wenhao Zhang , Liuyi Yao , Yilun Huang , Bolin Ding , Yaliang Li , Shuiguang Deng

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal…

Computation and Language · Computer Science 2025-06-03 Weiqi Feng , Yangrui Chen , Shaoyu Wang , Yanghua Peng , Haibin Lin , Minlan Yu

OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Foundation models update slowly due to resource-intensive training, whereas domain-specific models evolve rapidly between releases. Model merging seeks to combine multiple expert models into a single, more capable model, reducing storage…

Artificial Intelligence · Computer Science 2026-03-04 Yongxian Wei , Runxi Cheng , Weike Jin , Enneng Yang , Li Shen , Lu Hou , Sinan Du , Chun Yuan , Xiaochun Cao , Dacheng Tao

ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism

Multimodal large language models (MLLMs) extend LLMs to handle images, videos, and audio by incorporating feature extractors and projection modules. However, these additional components -- combined with complex inference pipelines and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-12 Zedong Liu , Shenggan Cheng , Guangming Tan , Yang You , Dingwen Tao

OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon that the proportion of a certain modality…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-13 Yijie Zheng , Bangjun Xiao , Lei Shi , Xiaoyang Li , Faming Wu , Tianyu Li , Xuefeng Xiao , Yang Zhang , Yuxuan Wang , Shouda Liu

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-16 Quentin Anthony , Ammar Ahmad Awan , Jeff Rasley , Yuxiong He , Aamir Shafi , Mustafa Abduljabbar , Hari Subramoni , Dhabaleswar Panda

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

As the foundational component of versatile AI applications, training an multimodal large language model (MLLM) relies on multimodal datasets with dynamic modality mixture proportions and sample length distributions. However, existing MLLM…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Chunyu Xue , Yangrui Chen , Jianyu Jiang , Ningxin Zheng , Junda Feng , Jingji Chen , Shixiong Zhao , Shen Yan , Yi Lin , Lei Shi , Zanbo Wang , Lishu Luo , Faming Wu , Haibin Lin , Xin Liu , Yanghua Peng , Quan Chen

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural…

Machine Learning · Computer Science 2024-12-05 Minghao Shao , Abdul Basit , Ramesh Karri , Muhammad Shafique

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-25 Haowei Yang , Yu Tian , Zhongheng Yang , Zhao Wang , Chengrui Zhou , Dannier Li

FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism

Extending the context length (i.e., the maximum supported sequence length) of LLMs is of paramount significance. To facilitate long context training of LLMs, sequence parallelism has emerged as an essential technique, which scatters each…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-12 Yujie Wang , Shiju Wang , Shenhan Zhu , Fangcheng Fu , Xinyi Liu , Xuefeng Xiao , Huixia Li , Jiashi Li , Faming Wu , Bin Cui

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

Multimodal Large Language Models (MLLMs) have made significant advancements, demonstrating powerful capabilities in processing and understanding multimodal data. Fine-tuning MLLMs with Federated Learning (FL) allows for expanding the…

Machine Learning · Computer Science 2025-03-11 Binqian Xu , Xiangbo Shu , Haiyang Mei , Guosen Xie , Basura Fernando , Jinhui Tang

RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval

Multimodal information retrieval (MMIR) has gained attention for its flexibility in handling text, images, or mixed queries and candidates. Recent breakthroughs in multimodal large language models (MLLMs) boost MMIR performance by…

Information Retrieval · Computer Science 2026-02-27 Dawei Su , Dongsheng Wang

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Based on the foundation of Large Language Models (LLMs), Multilingual LLMs (MLLMs) have been developed to address the challenges faced in multilingual natural language processing, hoping to achieve knowledge transfer from high-resource…

Computation and Language · Computer Science 2024-12-10 Yuemei Xu , Ling Hu , Jiayi Zhao , Zihan Qiu , Kexin XU , Yuqi Ye , Hanwen Gu