Related papers: DiffRate : Differentiable Compression Rate for Eff…

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Haoran You , Connelly Barnes , Yuqian Zhou , Yan Kang , Zhenbang Du , Wei Zhou , Lingzhi Zhang , Yotam Nitzan , Xiaoyang Liu , Zhe Lin , Eli Shechtman , Sohrab Amirghodsi , Yingyan Celine Lin

Dynamic Token Reduction during Generation for Vision Language Models

Vision-Language Models (VLMs) have achieved notable success in multimodal tasks but face practical limitations due to the quadratic complexity of decoder attention mechanisms and autoregressive generation. Existing methods like FASTV and…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Xiaoyu Liang , Chaofeng Guan , Jiaying Lu , Huiyao Chen , Huan Wang , Haoji Hu

Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Shibo Jie , Yehui Tang , Jianyuan Guo , Zhi-Hong Deng , Kai Han , Yunhe Wang

Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models

Visual token pruning reduces the computational cost of Vision-Language Models (VLMs) by removing redundant visual tokens. Existing methods typically rely on Gumbel-Softmax to approximate discrete selection during training. However, the…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Landi He , Mingde Yao , Shawn Young , Lijian Xu

Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Token compression is essential for reducing the computational and memory requirements of transformer models, enabling their deployment in resource-constrained environments. In this work, we propose an efficient and hardware-compatible token…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Junzhu Mao , Yang Shen , Jinyang Guo , Yazhou Yao , Xiansheng Hua

Towards Efficient Low-rate Image Compression with Frequency-aware Diffusion Prior Refinement

Recent advancements in diffusion-based generative priors have enabled visually plausible image compression at extremely low bit rates. However, existing approaches suffer from slow sampling processes and suboptimal bit allocation due to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Yichong Xia , Yimin Zhou , Jinpeng Wang , Bin Chen

Token Compression Meets Compact Vision Transformers: A Survey and Comparative Evaluation for Edge AI

Token compression techniques have recently emerged as powerful tools for accelerating Vision Transformer (ViT) inference in computer vision. Due to the quadratic computational complexity with respect to the token sequence length, these…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Phat Nguyen , Ngai-Man Cheung

Recognition-Aware Learned Image Compression

Learned image compression methods generally optimize a rate-distortion loss, trading off improvements in visual distortion for added bitrate. Increasingly, however, compressed imagery is used as an input to deep learning networks for…

Image and Video Processing · Electrical Eng. & Systems 2022-02-02 Maxime Kawawa-Beaudan , Ryan Roggenkemper , Avideh Zakhor

Variation-aware Vision Token Dropping for Faster Large Vision-Language Models

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding tasks. However, the increasing demand for high-resolution image and long-video understanding results in substantial token counts,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Junjie Chen , Xuyang Liu , Zichen Wen , Yiyu Wang , Siteng Huang , Honggang Chen

Variable Rate Deep Image Compression With a Conditional Autoencoder

In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder. Previous learning-based image compression methods mostly require training separate networks for different compression rates…

Image and Video Processing · Electrical Eng. & Systems 2019-09-12 Yoojin Choi , Mostafa El-Khamy , Jungwon Lee

Variable-Rate Learned Image Compression with Multi-Objective Optimization and Quantization-Reconstruction Offsets

Achieving successful variable bitrate compression with computationally simple algorithms from a single end-to-end learned image or video compression model remains a challenge. Many approaches have been proposed, including conditional…

Image and Video Processing · Electrical Eng. & Systems 2024-03-01 Fatih Kamisli , Fabien Racape , Hyomin Choi

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Vision transformers have been widely explored in various vision tasks. Due to heavy computational cost, much interest has aroused for compressing vision transformer dynamically in the aspect of tokens. Current methods mainly pay attention…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Fanhu Zeng , Deli Yu , Zhenglun Kong , Hao Tang

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

Visual token compression is critical for Large Vision-Language Models (LVLMs) to efficiently process high-resolution inputs. Existing methods that typically adopt fixed compression ratios cannot adapt to scenes of varying complexity, often…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Quan-Sheng Zeng , Yunheng Li , Qilong Wang , Peng-Tao Jiang , Zuxuan Wu , Ming-Ming Cheng , Qibin Hou

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Jinqi Xiao , Miao Yin , Yu Gong , Xiao Zang , Jian Ren , Bo Yuan

Accelerating Diffusion Transformers with Token-wise Feature Caching

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Joakim Bruslund Haurum , Sergio Escalera , Graham W. Taylor , Thomas B. Moeslund

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2023-04-24 Siyuan Wei , Tianzhu Ye , Shen Zhang , Yao Tang , Jiajun Liang

Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems

Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression…

Image and Video Processing · Electrical Eng. & Systems 2024-04-02 Md Adnan Faisal Hossain , Zhihao Duan , Yuning Huang , Fengqing Zhu