Related papers: Token Pruning for In-Context Generation in Diffusi…

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) deliver remarkable image and video generation quality but incur high computational cost, limiting scalability and on-device deployment. We introduce CoReDiT, a structured token pruning framework for DiTs across…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Zhuojin Li , Hsin-Pai Cheng , Hong Cai , Shizhong Han , Fatih Porikli

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models

Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis; however, their iterative denoising process demands substantial computational resources. In this paper, we present a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinle Cheng , Zhuoming Chen , Zhihao Jia

In-Context LoRA for Diffusion Transformers

Recent research arXiv:2410.15027 has explored the use of diffusion transformers (DiTs) for task-agnostic image generation by simply concatenating attention tokens across images. However, despite substantial computational resources, the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Lianghua Huang , Wei Wang , Zhi-Fan Wu , Yupeng Shi , Huanzhang Dou , Chen Liang , Yutong Feng , Yu Liu , Jingren Zhou

Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness

Diffusion Transformers (DiTs) achieve superior image generation quality but suffer from quadratic computational complexity relative to token count. While various token reduction (TR) methods have been proposed to mitigate this cost, they…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Hangyeol Lee , Hyojeong Lee , Joo-Young Kim

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a…

Computation and Language · Computer Science 2023-06-27 Junyan Li , Li Lyna Zhang , Jiahang Xu , Yujing Wang , Shaoguang Yan , Yunqing Xia , Yuqing Yang , Ting Cao , Hao Sun , Weiwei Deng , Qi Zhang , Mao Yang

Revisiting Token Pruning for Object Detection and Instance Segmentation

Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Yifei Liu , Mathias Gehrig , Nico Messikommer , Marco Cannici , Davide Scaramuzza

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens…

Computation and Language · Computer Science 2024-06-03 Sotiris Anagnostidis , Dario Pavllo , Luca Biggio , Lorenzo Noci , Aurelien Lucchi , Thomas Hofmann

Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free

Stable Diffusion has achieved remarkable success in the field of text-to-image generation, with its powerful generative capabilities and diverse generation results making a lasting impact. However, its iterative denoising introduces high…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Evelyn Zhang , Bang Xiao , Jiayi Tang , Qianli Ma , Chang Zou , Xuefei Ning , Xuming Hu , Linfeng Zhang

Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation

Vision transformers have achieved leading performance on various visual tasks yet still suffer from high computational complexity. The situation deteriorates in dense prediction tasks like semantic segmentation, as high-resolution inputs…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Quan Tang , Bowen Zhang , Jiajun Liu , Fagui Liu , Yifan Liu

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers

Fine-grained and efficient controllability on video diffusion transformers has raised increasing desires for the applicability. Recently, In-context Conditioning emerged as a powerful paradigm for unified conditional video generation, which…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Xuanhua He , Quande Liu , Zixuan Ye , Weicai Ye , Qiulin Wang , Xintao Wang , Qifeng Chen , Pengfei Wan , Di Zhang , Kun Gai

CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning

Modern large vision-language models (LVLMs) convert each input image into a large set of tokens that far outnumber the text tokens. Although this improves visual perception, it also introduces severe image token redundancy. Because image…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Yanshu Li , Jianjiang Yang , Zhennan Shen , Ligong Han , Haoyan Xu , Ruixiang Tang

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but remain computationally intensive because generation requires attention computation over long spatiotemporal sequences. Token…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Sheng Li , Yang Sui , Junhao Ran , Bo Yuan , Yue Dai , Xulong Tang

Token Fusion: Bridging the Gap between Token Pruning and Token Merging

Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Minchul Kim , Shangqian Gao , Yen-Chang Hsu , Yilin Shen , Hongxia Jin

PPT: Token Pruning and Pooling for Efficient Vision Transformers

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Xinjian Wu , Fanhu Zeng , Xiudong Wang , Xinghao Chen

Prompt-based Dynamic Token Pruning for Efficient Segmentation of Medical Images

The high computational demands of Vision Transformers (ViTs) in processing a large number of tokens often constrain their practical application in analyzing medical images. This research proposes a Prompt-driven Adaptive Token ({\it PrATo})…

Computer Vision and Pattern Recognition · Computer Science 2025-08-27 Pallabi Dutta , Anubhab Maity , Sushmita Mitra

Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Shibo Jie , Yehui Tang , Jianyuan Guo , Zhi-Hong Deng , Kai Han , Yunhe Wang

NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer

Diffusion Transformers (DiTs) have demonstrated exceptional capabilities in text-to-image synthesis. However, in the domain of controllable text-to-image generation using DiTs, most existing methods still rely on the ControlNet paradigm…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Shanyuan Liu , Jian Zhu , Junda Lu , Yue Gong , Liuzhuozheng Li , Bo Cheng , Yuhang Ma , Liebucha Wu , Xiaoyu Wu , Dawei Leng , Yuhui Yin

Discriminative Class Tokens for Text-to-Image Diffusion Models

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Idan Schwartz , Vésteinn Snæbjarnarson , Hila Chefer , Ryan Cotterell , Serge Belongie , Lior Wolf , Sagie Benaim

Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

Diffusion Transformers (DiT) have become the de-facto model for generating high-quality visual content like videos and images. A huge bottleneck is the attention mechanism where complexity scales quadratically with resolution and video…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Ruichen Chen , Keith G. Mills , Liyao Jiang , Chao Gao , Di Niu

Personalize Anything for Free with Diffusion Transformer

Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Haoran Feng , Zehuan Huang , Lin Li , Hairong Lv , Lu Sheng