Related papers: Sparse-LaViDa: Sparse Multimodal Discrete Diffusio…

SparseDM: Toward Sparse Efficient Diffusion Models

Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on…

Machine Learning · Computer Science 2025-04-18 Kafeng Wang , Jianfei Chen , He Li , Zhenpeng Mi , Jun Zhu

Sparse-to-Sparse Training of Diffusion Models

Diffusion models (DMs) are a powerful type of generative models that have achieved state-of-the-art results in various image synthesis tasks and have shown potential in other domains, such as natural language processing and temporal data…

Machine Learning · Computer Science 2026-02-05 Inês Cardoso Oliveira , Decebal Constantin Mocanu , Luis A. Leiva

LaViDa: A Large Diffusion Language Model for Multimodal Understanding

Modern Vision-Language Models (VLMs) can solve a wide range of tasks requiring visual reasoning. In real-world scenarios, desirable properties for VLMs include fast inference and controllable generation (e.g., constraining outputs to adhere…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Shufan Li , Konstantinos Kallidromitis , Hritik Bansal , Akash Gokul , Yusuke Kato , Kazuki Kozuka , Jason Kuen , Zhe Lin , Kai-Wei Chang , Aditya Grover

Sparsely Supervised Diffusion

Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield…

Machine Learning · Computer Science 2026-02-04 Wenshuai Zhao , Zhiyuan Li , Yi Zhao , Mohammad Hassan Vali , Martin Trapp , Joni Pajarinen , Juho Kannala , Arno Solin

SparseD: Sparse Attention for Diffusion Language Models

While diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs), existing open-source DLMs suffer from high inference latency. This bottleneck is mainly due to the attention's quadratic complexity with…

Computation and Language · Computer Science 2025-09-30 Zeqing Wang , Gongfan Fang , Xinyin Ma , Xingyi Yang , Xinchao Wang

DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models

Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Kushal Patel , Tanush Mittal , Arjun Laxman

Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation

We propose Lavida-O, a unified Masked Diffusion Model (MDM) for multimodal understanding and generation. Unlike existing multimodal MDMs such as MMaDa and Muddit which only support simple image-level understanding tasks and low-resolution…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Shufan Li , Jiuxiang Gu , Kangning Liu , Zhe Lin , Zijun Wei , Aditya Grover , Jason Kuen

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences…

Machine Learning · Computer Science 2025-10-23 Chen-Hao Chao , Wei-Fang Sun , Hanwen Liang , Chun-Yi Lee , Rahul G. Krishnan

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive quadratic computational complexity and memory overhead during inference. Current caching techniques accelerate…

Computation and Language · Computer Science 2025-11-06 Yuerong Song , Xiaoran Liu , Ruixiao Li , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Ziwei He , Xipeng Qiu

Fast Inference in Denoising Diffusion Models via MMD Finetuning

Denoising Diffusion Models (DDMs) have become a popular tool for generating high-quality samples from complex data distributions. These models are able to capture sophisticated patterns and structures in the data, and can generate samples…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Emanuele Aiello , Diego Valsesia , Enrico Magli

Looped Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that…

Machine Learning · Computer Science 2026-05-26 Sanghyun Lee , Chunsan Hong , Seungryong Kim , Jonghyun Lee , Jongho Park , Dongmin Park

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

Diffusion models have achieved success in high-fidelity data synthesis, yet their capacity for more complex, structured reasoning like text following tasks remains constrained. While advances in language models have leveraged strategies…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Yuwei Sun , Yuxuan Yao , Hui Li , Siyu Zhu

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Diffusion Transformers (DiT) are renowned for their impressive generative performance; however, they are significantly constrained by considerable computational costs due to the quadratic complexity in self-attention and the extensive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Shuning Chang , Pichao Wang , Jiasheng Tang , Fan Wang , Yi Yang

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process. To solve this…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Shanghua Gao , Pan Zhou , Ming-Ming Cheng , Shuicheng Yan

Efficient Token Pruning for LLaDA-V

Diffusion-based large multimodal models, such as LLaDA-V, have demonstrated impressive capabilities in vision-language understanding and generation. However, their bidirectional attention mechanism and diffusion-style iterative denoising…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Zhewen Wan , Tianchen Song , Chen Lin , Zhiyong Zhao , Xianpeng Lang

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have…

Computation and Language · Computer Science 2023-04-11 Jiaao Chen , Aston Zhang , Mu Li , Alex Smola , Diyi Yang

Learning Sparse Masks for Diffusion-based Image Inpainting

Diffusion-based inpainting is a powerful tool for the reconstruction of images from sparse data. Its quality strongly depends on the choice of known data. Optimising their spatial location -- the inpainting mask -- is challenging. A…

Image and Video Processing · Electrical Eng. & Systems 2022-05-17 Tobias Alt , Pascal Peter , Joachim Weickert

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Discrete diffusion models have recently shown great promise for modeling complex discrete data, with masked diffusion models (MDMs) offering a compelling trade-off between quality and generation speed. MDMs denoise by progressively…

Machine Learning · Computer Science 2026-04-15 Tianyu Xie , Shuchen Xue , Zijin Feng , Tianyang Hu , Jiacheng Sun , Zhenguo Li , Cheng Zhang

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Masked Image Generation Models (MIGMs) have achieved great success, yet their efficiency is hampered by the multiple steps of bi-directional attention. In fact, there exists notable redundancy in their computation: when sampling discrete…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Kaiwen Zhu , Quansheng Zeng , Yuandong Pu , Shuo Cao , Xiaohui Li , Yi Xin , Qi Qin , Jiayang Li , Yu Qiao , Jinjin Gu , Yihao Liu

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Changlu Guo , Anders Nymark Christensen , Anders Bjorholm Dahl , Morten Rieger Hannemose