English
Related papers

Related papers: Token Merging for Fast Stable Diffusion

200 papers

Diffusion models have emerged as a promising approach for generating high-quality, high-dimensional images. Nevertheless, these models are hindered by their high computational cost and slow inference, partly due to the quadratic…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Omid Saghatchian , Atiyeh Gh. Moghadam , Ahmad Nickabadi

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Ethan Smith , Nayan Saxena , Aninda Saha

Diffusion models excel in high-fidelity image generation but face scalability limits due to transformers' quadratic attention complexity. Plug-and-play token reduction methods like ToMeSD and ToFu reduce FLOPs by merging redundant tokens in…

Machine Learning · Computer Science 2025-12-02 Wenbo Lu , Shaoyi Zheng , Yuxuan Xia , Shengjie Wang

We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. ToMe gradually combines similar tokens in a transformer using a general and light-weight matching algorithm that…

Computer Vision and Pattern Recognition · Computer Science 2023-03-03 Daniel Bolya , Cheng-Yang Fu , Xiaoliang Dai , Peizhao Zhang , Christoph Feichtenhofer , Judy Hoffman

Diffusion models have made significant advances in generating high-quality images, but their application to video generation has remained challenging due to the complexity of temporal motion. Zero-shot video editing offers a solution by…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Xirui Li , Chao Ma , Xiaokang Yang , Ming-Hsuan Yang

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Token merging can effectively accelerate various vision systems by processing groups of similar tokens only once and sharing the results across them. However, existing token grouping methods are often ad hoc and random, disregarding the…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Haoyu Wu , Jingyi Xu , Hieu Le , Dimitris Samaras

Stable Diffusion has achieved remarkable success in the field of text-to-image generation, with its powerful generative capabilities and diverse generation results making a lasting impact. However, its iterative denoising introduces high…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Evelyn Zhang , Bang Xiao , Jiayi Tang , Qianli Ma , Chang Zou , Xuefei Ning , Xuming Hu , Linfeng Zhang

Stable diffusion is an outstanding image generation model for text-to-image, but its time-consuming generation process remains a challenge due to the quadratic complexity of attention operations. Recent token merging methods improve…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Min-Jeong Lee , Hee-Dong Kim , Seong-Whan Lee

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Minglei Shi , Ziyang Yuan , Haotian Yang , Xintao Wang , Mingwu Zheng , Xin Tao , Wenliang Zhao , Wenzhao Zheng , Jie Zhou , Jiwen Lu , Pengfei Wan , Di Zhang , Kun Gai

We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images. Despite its tremendous success, the standard denoising process…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zirui Wang , Zhizhou Sha , Zheng Ding , Yilin Wang , Zhuowen Tu

Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and…

Generating high-quality labeled image datasets is crucial for training accurate and robust machine learning models in the field of computer vision. However, the process of manually labeling real images is often time-consuming and costly. To…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Michael Shenoda , Edward Kim

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Noam Elata , Bahjat Kawar , Tomer Michaeli , Michael Elad

The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Jinchao Zhu , Yuxuan Wang , Siyuan Pan , Pengfei Wan , Di Zhang , Gao Huang

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Generative models have increasingly impacted various tasks, from computer vision to interior design and beyond. Stable Diffusion, a powerful diffusion model, enables the creation of high-resolution images with intricate details from text…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Boyang Deng

The Stable Diffusion Model (SDM) is a popular and efficient text-to-image (t2i) generation and image-to-image (i2i) generation model. Although there have been some attempts to reduce sampling steps, model distillation, and network…

Computer Vision and Pattern Recognition · Computer Science 2024-03-06 Jinchao Zhu , Yuxuan Wang , Xiaobing Tu , Siyuan Pan , Pengfei Wan , Gao Huang

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy
‹ Prev 1 2 3 10 Next ›