English
Related papers

Related papers: DiffRate : Differentiable Compression Rate for Eff…

200 papers

Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Haoran You , Connelly Barnes , Yuqian Zhou , Yan Kang , Zhenbang Du , Wei Zhou , Lingzhi Zhang , Yotam Nitzan , Xiaoyang Liu , Zhe Lin , Eli Shechtman , Sohrab Amirghodsi , Yingyan Celine Lin

Vision-Language Models (VLMs) have achieved notable success in multimodal tasks but face practical limitations due to the quadratic complexity of decoder attention mechanisms and autoregressive generation. Existing methods like FASTV and…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Xiaoyu Liang , Chaofeng Guan , Jiaying Lu , Huiyao Chen , Huan Wang , Haoji Hu

Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Shibo Jie , Yehui Tang , Jianyuan Guo , Zhi-Hong Deng , Kai Han , Yunhe Wang

Visual token pruning reduces the computational cost of Vision-Language Models (VLMs) by removing redundant visual tokens. Existing methods typically rely on Gumbel-Softmax to approximate discrete selection during training. However, the…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Landi He , Mingde Yao , Shawn Young , Lijian Xu

Token compression is essential for reducing the computational and memory requirements of transformer models, enabling their deployment in resource-constrained environments. In this work, we propose an efficient and hardware-compatible token…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Junzhu Mao , Yang Shen , Jinyang Guo , Yazhou Yao , Xiansheng Hua

Recent advancements in diffusion-based generative priors have enabled visually plausible image compression at extremely low bit rates. However, existing approaches suffer from slow sampling processes and suboptimal bit allocation due to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Yichong Xia , Yimin Zhou , Jinpeng Wang , Bin Chen

Token compression techniques have recently emerged as powerful tools for accelerating Vision Transformer (ViT) inference in computer vision. Due to the quadratic computational complexity with respect to the token sequence length, these…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Phat Nguyen , Ngai-Man Cheung

Learned image compression methods generally optimize a rate-distortion loss, trading off improvements in visual distortion for added bitrate. Increasingly, however, compressed imagery is used as an input to deep learning networks for…

Image and Video Processing · Electrical Eng. & Systems 2022-02-02 Maxime Kawawa-Beaudan , Ryan Roggenkemper , Avideh Zakhor

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding tasks. However, the increasing demand for high-resolution image and long-video understanding results in substantial token counts,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Junjie Chen , Xuyang Liu , Zichen Wen , Yiyu Wang , Siteng Huang , Honggang Chen

In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder. Previous learning-based image compression methods mostly require training separate networks for different compression rates…

Image and Video Processing · Electrical Eng. & Systems 2019-09-12 Yoojin Choi , Mostafa El-Khamy , Jungwon Lee

Achieving successful variable bitrate compression with computationally simple algorithms from a single end-to-end learned image or video compression model remains a challenge. Many approaches have been proposed, including conditional…

Image and Video Processing · Electrical Eng. & Systems 2024-03-01 Fatih Kamisli , Fabien Racape , Hyomin Choi

Vision transformers have been widely explored in various vision tasks. Due to heavy computational cost, much interest has aroused for compressing vision transformer dynamically in the aspect of tokens. Current methods mainly pay attention…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Fanhu Zeng , Deli Yu , Zhenglun Kong , Hao Tang

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Visual token compression is critical for Large Vision-Language Models (LVLMs) to efficiently process high-resolution inputs. Existing methods that typically adopt fixed compression ratios cannot adapt to scenes of varying complexity, often…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Quan-Sheng Zeng , Yunheng Li , Qilong Wang , Peng-Tao Jiang , Zuxuan Wu , Ming-Ming Cheng , Qibin Hou

Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Jinqi Xiao , Miao Yin , Yu Gong , Xiao Zang , Jian Ren , Bo Yuan

Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion…

Machine Learning · Computer Science 2025-02-20 Chang Zou , Xuyang Liu , Ting Liu , Siteng Huang , Linfeng Zhang

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Joakim Bruslund Haurum , Sergio Escalera , Graham W. Taylor , Thomas B. Moeslund

Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2023-04-24 Siyuan Wei , Tianzhu Ye , Shen Zhang , Yao Tang , Jiajun Liang

Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression…

Image and Video Processing · Electrical Eng. & Systems 2024-04-02 Md Adnan Faisal Hossain , Zhihao Duan , Yuning Huang , Fengqing Zhu
‹ Prev 1 2 3 10 Next ›