Related papers: CAT: Content-Adaptive Image Tokenization

Adaptive Length Image Tokenization via Recurrent Allocation

Current vision systems typically assign fixed-length representations to images, regardless of the information content. This contrasts with human intelligence - and even large language models - which allocate varying representational…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Shivam Duggal , Phillip Isola , Antonio Torralba , William T. Freeman

CAT: Compression-Aware Training for bandwidth reduction

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Chaim Baskin , Brian Chmiel , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Language-Guided Image Tokenization for Generation

Image tokenization, the process of transforming raw image pixels into a compact low-dimensional latent representation, has proven crucial for scalable and efficient image generation. However, mainstream image tokenization methods generally…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Kaiwen Zha , Lijun Yu , Alireza Fathi , David A. Ross , Cordelia Schmid , Dina Katabi , Xiuye Gu

Content Adaptive Latents and Decoder for Neural Image Compression

In recent years, neural image compression (NIC) algorithms have shown powerful coding performance. However, most of them are not adaptive to the image content. Although several content adaptive methods have been proposed by updating the…

Computer Vision and Pattern Recognition · Computer Science 2022-12-22 Guanbo Pan , Guo Lu , Zhihao Hu , Dong Xu

CADC: Content Adaptive Diffusion-Based Generative Image Compression

Diffusion-based generative image compression has demonstrated remarkable potential for achieving realistic reconstruction at ultra-low bitrates. The key to unlocking this potential lies in making the entire compression process…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Xihua Sheng , Lingyu Zhu , Tianyu Zhang , Dong Liu , Shiqi Wang , Jing Wang

Single-pass Adaptive Image Tokenization for Minimum Program Search

According to Algorithmic Information Theory (AIT) -- Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Shivam Duggal , Sanghyun Byun , William T. Freeman , Antonio Torralba , Phillip Isola

CALLIC: Content Adaptive Learning for Lossless Image Compression

Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Daxin Li , Yuanchao Bai , Kai Wang , Junjun Jiang , Xianming Liu , Wen Gao

ATD: Improved Transformer with Adaptive Token Dictionary for Image Restoration

Recently, Transformers have gained significant popularity in image restoration tasks such as image super-resolution and denoising, owing to their superior performance. However, balancing performance and computational burden remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Leheng Zhang , Wei Long , Yawei Li , Xingyu Zhou , Xiaorui Zhao , Shuhang Gu

Communication-Inspired Tokenization for Structured Image Representations

Discrete image tokenizers have emerged as a key component of modern vision and multimodal systems, providing a sequential interface for transformer-based architectures. However, most existing approaches remain primarily optimized for…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Aram Davtyan , Yusuf Sahin , Yasaman Haghighi , Sebastian Stapf , Pablo Acuaviva , Alexandre Alahi , Paolo Favaro

CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning

Modern large vision-language models (LVLMs) convert each input image into a large set of tokens that far outnumber the text tokens. Although this improves visual perception, it also introduces severe image token redundancy. Because image…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Yanshu Li , Jianjiang Yang , Zhennan Shen , Ligong Han , Haoyan Xu , Ruixiang Tang

Attention and Compression is all you need for Controllably Efficient Language Models

The quadratic cost of attention in transformers motivated the development of efficient approaches: namely sparse and sliding window attention, convolutions and linear attention. Although these approaches result in impressive reductions in…

Machine Learning · Computer Science 2025-11-10 Jatin Prakash , Aahlad Puli , Rajesh Ranganath

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Traditional image codecs emphasize signal fidelity and human perception, often at the expense of machine vision tasks. Deep learning methods have demonstrated promising coding performance by utilizing rich semantic embeddings optimized for…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Sha Guo , Zhuo Chen , Yang Zhao , Ning Zhang , Xiaotong Li , Lingyu Duan

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models

Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis; however, their iterative denoising process demands substantial computational resources. In this paper, we present a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinle Cheng , Zhuoming Chen , Zhihao Jia

Adaptively Aligned Image Captioning via Adaptive Attention Time

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word,…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Lun Huang , Wenmin Wang , Yaxian Xia , Jie Chen

AICT: An Adaptive Image Compression Transformer

Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Ahmed Ghorbel , Wassim Hamidouche , Luce Morin

InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression

Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a significant bottleneck for current tokenizers, which…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Haotian Ye , Qiyuan He , Jiaqi Han , Puheng Li , Jiaojiao Fan , Zekun Hao , Fitsum Reda , Yogesh Balaji , Huayu Chen , Sheng Liu , Angela Yao , James Zou , Stefano Ermon , Haoxiang Wang , Ming-Yu Liu

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Nianzu Yang , Pandeng Li , Liming Zhao , Yang Li , Chen-Wei Xie , Yehui Tang , Xudong Lu , Zhihang Liu , Yun Zheng , Yu Liu , Junchi Yan

Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement

Image decomposition offers deep insights into the imaging factors of visual data and significantly enhances various advanced computer vision tasks. In this work, we introduce a novel approach to low-light image enhancement based on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Xingxing Yang , Jie Chen , Zaifeng Yang

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Riccardo Del Chiaro , Bartłomiej Twardowski , Andrew D. Bagdanov , Joost van de Weijer

Make A Long Image Short: Adaptive Token Length for Vision Transformers

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in…

Machine Learning · Computer Science 2023-07-06 Qiqi Zhou , Yichen Zhu