English
Related papers

Related papers: Token Pruning for In-Context Generation in Diffusi…

200 papers

Diffusion Transformers (DiTs) deliver remarkable image and video generation quality but incur high computational cost, limiting scalability and on-device deployment. We introduce CoReDiT, a structured token pruning framework for DiTs across…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Zhuojin Li , Hsin-Pai Cheng , Hong Cai , Shizhong Han , Fatih Porikli

Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis; however, their iterative denoising process demands substantial computational resources. In this paper, we present a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinle Cheng , Zhuoming Chen , Zhihao Jia

Recent research arXiv:2410.15027 has explored the use of diffusion transformers (DiTs) for task-agnostic image generation by simply concatenating attention tokens across images. However, despite substantial computational resources, the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Lianghua Huang , Wei Wang , Zhi-Fan Wu , Yupeng Shi , Huanzhang Dou , Chen Liang , Yutong Feng , Yu Liu , Jingren Zhou

Diffusion Transformers (DiTs) achieve superior image generation quality but suffer from quadratic computational complexity relative to token count. While various token reduction (TR) methods have been proposed to mitigate this cost, they…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Hangyeol Lee , Hyojeong Lee , Joo-Young Kim

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a…

Computation and Language · Computer Science 2023-06-27 Junyan Li , Li Lyna Zhang , Jiahang Xu , Yujing Wang , Shaoguang Yan , Yunqing Xia , Yuqing Yang , Ting Cao , Hao Sun , Weiwei Deng , Qi Zhang , Mao Yang

Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Yifei Liu , Mathias Gehrig , Nico Messikommer , Marco Cannici , Davide Scaramuzza

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens…

Computation and Language · Computer Science 2024-06-03 Sotiris Anagnostidis , Dario Pavllo , Luca Biggio , Lorenzo Noci , Aurelien Lucchi , Thomas Hofmann

Stable Diffusion has achieved remarkable success in the field of text-to-image generation, with its powerful generative capabilities and diverse generation results making a lasting impact. However, its iterative denoising introduces high…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Evelyn Zhang , Bang Xiao , Jiayi Tang , Qianli Ma , Chang Zou , Xuefei Ning , Xuming Hu , Linfeng Zhang

Vision transformers have achieved leading performance on various visual tasks yet still suffer from high computational complexity. The situation deteriorates in dense prediction tasks like semantic segmentation, as high-resolution inputs…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Quan Tang , Bowen Zhang , Jiajun Liu , Fagui Liu , Yifan Liu

Fine-grained and efficient controllability on video diffusion transformers has raised increasing desires for the applicability. Recently, In-context Conditioning emerged as a powerful paradigm for unified conditional video generation, which…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Xuanhua He , Quande Liu , Zixuan Ye , Weicai Ye , Qiulin Wang , Xintao Wang , Qifeng Chen , Pengfei Wan , Di Zhang , Kun Gai

Modern large vision-language models (LVLMs) convert each input image into a large set of tokens that far outnumber the text tokens. Although this improves visual perception, it also introduces severe image token redundancy. Because image…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Yanshu Li , Jianjiang Yang , Zhennan Shen , Ligong Han , Haoyan Xu , Ruixiang Tang

Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but remain computationally intensive because generation requires attention computation over long spatiotemporal sequences. Token…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Sheng Li , Yang Sui , Junhao Ran , Bo Yuan , Yue Dai , Xulong Tang

Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Minchul Kim , Shangqian Gao , Yen-Chang Hsu , Yilin Shen , Hongxia Jin

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Xinjian Wu , Fanhu Zeng , Xiudong Wang , Xinghao Chen

The high computational demands of Vision Transformers (ViTs) in processing a large number of tokens often constrain their practical application in analyzing medical images. This research proposes a Prompt-driven Adaptive Token ({\it PrATo})…

Computer Vision and Pattern Recognition · Computer Science 2025-08-27 Pallabi Dutta , Anubhab Maity , Sushmita Mitra

Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Shibo Jie , Yehui Tang , Jianyuan Guo , Zhi-Hong Deng , Kai Han , Yunhe Wang

Diffusion Transformers (DiTs) have demonstrated exceptional capabilities in text-to-image synthesis. However, in the domain of controllable text-to-image generation using DiTs, most existing methods still rely on the ControlNet paradigm…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Shanyuan Liu , Jian Zhu , Junda Lu , Yue Gong , Liuzhuozheng Li , Bo Cheng , Yuhang Ma , Liebucha Wu , Xiaoyu Wu , Dawei Leng , Yuhui Yin

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Idan Schwartz , Vésteinn Snæbjarnarson , Hila Chefer , Ryan Cotterell , Serge Belongie , Lior Wolf , Sagie Benaim

Diffusion Transformers (DiT) have become the de-facto model for generating high-quality visual content like videos and images. A huge bottleneck is the attention mechanism where complexity scales quadratically with resolution and video…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Ruichen Chen , Keith G. Mills , Liyao Jiang , Chao Gao , Di Niu

Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Haoran Feng , Zehuan Huang , Lin Li , Hairong Lv , Lu Sheng
‹ Prev 1 2 3 10 Next ›