Related papers: Token Merging for Fast Stable Diffusion

Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model

Diffusion models have emerged as a promising approach for generating high-quality, high-dimensional images. Nevertheless, these models are hindered by their high computational cost and slow inference, partly due to the quadratic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Omid Saghatchian , Atiyeh Gh. Moghadam , Ahmad Nickabadi

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Ethan Smith , Nayan Saxena , Aninda Saha

ToMA: Token Merge with Attention for Diffusion Models

Diffusion models excel in high-fidelity image generation but face scalability limits due to transformers' quadratic attention complexity. Plug-and-play token reduction methods like ToMeSD and ToFu reduce FLOPs by merging redundant tokens in…

Machine Learning · Computer Science 2025-12-02 Wenbo Lu , Shaoyi Zheng , Yuxuan Xia , Shengjie Wang

Token Merging: Your ViT But Faster

We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. ToMe gradually combines similar tokens in a transformer using a general and light-weight matching algorithm that…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Daniel Bolya , Cheng-Yang Fu , Xiaoliang Dai , Peizhao Zhang , Christoph Feichtenhofer , Judy Hoffman

VidToMe: Video Token Merging for Zero-Shot Video Editing

Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Xirui Li , Chao Ma , Xiaokang Yang , Ming-Hsuan Yang

Accelerating Diffusion Transformers with Token-wise Feature Caching

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Importance-Based Token Merging for Efficient Image and Video Generation

Token merging can effectively accelerate various vision systems by processing groups of similar tokens only once and sharing the results across them. However, existing token grouping methods are often ad hoc and random, disregarding the…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Haoyu Wu , Jingyi Xu , Hieu Le , Dimitris Samaras

Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free

Stable Diffusion has achieved remarkable success in the field of text-to-image generation, with its powerful generative capabilities and diverse generation results making a lasting impact. However, its iterative denoising introduces high…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Evelyn Zhang , Bang Xiao , Jiayi Tang , Qianli Ma , Chang Zou , Xuefei Ning , Xuming Hu , Linfeng Zhang

Local Representative Token Guided Merging for Text-to-Image Generation

Stable diffusion is an outstanding image generation model for text-to-image, but its time-consuming generation process remains a challenge due to the quadratic complexity of attention operations. Recent token merging methods improve…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Min-Jeong Lee , Hee-Dong Kim , Seong-Whan Lee

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Minglei Shi , Ziyang Yuan , Haotian Yang , Xintao Wang , Mingwu Zheng , Xin Tao , Wenliang Zhao , Wenzhao Zheng , Jie Zhou , Jiwen Lu , Pengfei Wan , Di Zhang , Kun Gai

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images. Despite its tremendous success, the standard denoising process…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zirui Wang , Zhizhou Sha , Zheng Ding , Yilin Wang , Zhuowen Tu

Accelerating Transformers with Spectrum-Preserving Token Merging

Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and…

Machine Learning · Computer Science 2024-10-31 Hoai-Chau Tran , Duy M. H. Nguyen , Duy M. Nguyen , Trung-Tin Nguyen , Ngan Le , Pengtao Xie , Daniel Sonntag , James Y. Zou , Binh T. Nguyen , Mathias Niepert

DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using Stable Diffusion Models

Generating high-quality labeled image datasets is crucial for training accurate and robust machine learning models in the field of computer vision. However, the process of manually labeling real images is often time-consuming and costly. To…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Michael Shenoda , Edward Kim

Nested Diffusion Processes for Anytime Image Generation

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Noam Elata , Bahjat Kawar , Tomer Michaeli , Michael Elad

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Jinchao Zhu , Yuxuan Wang , Siyuan Pan , Pengfei Wan , Di Zhang , Gao Huang

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Stable Diffusion for Data Augmentation in COCO and Weed Datasets

Generative models have increasingly impacted various tasks, from computer vision to interior design and beyond. Stable Diffusion, a powerful diffusion model, enables the creation of high-resolution images with intricate details from text…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Boyang Deng

A-SDM: Accelerating Stable Diffusion through Redundancy Removal and Performance Optimization

The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model. Although there have been some attempts to reduce sampling steps, model distillation, and network…

Computer Vision and Pattern Recognition · Computer Science 2024-03-06 Jinchao Zhu , Yuxuan Wang , Xiaobing Tu , Siyuan Pan , Pengfei Wan , Gao Huang

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy