English
Related papers

Related papers: CAT: Content-Adaptive Image Tokenization

200 papers

Current vision systems typically assign fixed-length representations to images, regardless of the information content. This contrasts with human intelligence - and even large language models - which allocate varying representational…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Shivam Duggal , Phillip Isola , Antonio Torralba , William T. Freeman

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Chaim Baskin , Brian Chmiel , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Image tokenization, the process of transforming raw image pixels into a compact low-dimensional latent representation, has proven crucial for scalable and efficient image generation. However, mainstream image tokenization methods generally…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Kaiwen Zha , Lijun Yu , Alireza Fathi , David A. Ross , Cordelia Schmid , Dina Katabi , Xiuye Gu

In recent years, neural image compression (NIC) algorithms have shown powerful coding performance. However, most of them are not adaptive to the image content. Although several content adaptive methods have been proposed by updating the…

Computer Vision and Pattern Recognition · Computer Science 2022-12-22 Guanbo Pan , Guo Lu , Zhihao Hu , Dong Xu

Diffusion-based generative image compression has demonstrated remarkable potential for achieving realistic reconstruction at ultra-low bitrates. The key to unlocking this potential lies in making the entire compression process…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Xihua Sheng , Lingyu Zhu , Tianyu Zhang , Dong Liu , Shiqi Wang , Jing Wang

According to Algorithmic Information Theory (AIT) -- Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Shivam Duggal , Sanghyun Byun , William T. Freeman , Antonio Torralba , Phillip Isola

Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Daxin Li , Yuanchao Bai , Kai Wang , Junjun Jiang , Xianming Liu , Wen Gao

Recently, Transformers have gained significant popularity in image restoration tasks such as image super-resolution and denoising, owing to their superior performance. However, balancing performance and computational burden remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Leheng Zhang , Wei Long , Yawei Li , Xingyu Zhou , Xiaorui Zhao , Shuhang Gu

Discrete image tokenizers have emerged as a key component of modern vision and multimodal systems, providing a sequential interface for transformer-based architectures. However, most existing approaches remain primarily optimized for…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Aram Davtyan , Yusuf Sahin , Yasaman Haghighi , Sebastian Stapf , Pablo Acuaviva , Alexandre Alahi , Paolo Favaro

Modern large vision-language models (LVLMs) convert each input image into a large set of tokens that far outnumber the text tokens. Although this improves visual perception, it also introduces severe image token redundancy. Because image…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Yanshu Li , Jianjiang Yang , Zhennan Shen , Ligong Han , Haoyan Xu , Ruixiang Tang

The quadratic cost of attention in transformers motivated the development of efficient approaches: namely sparse and sliding window attention, convolutions and linear attention. Although these approaches result in impressive reductions in…

Machine Learning · Computer Science 2025-11-10 Jatin Prakash , Aahlad Puli , Rajesh Ranganath

Traditional image codecs emphasize signal fidelity and human perception, often at the expense of machine vision tasks. Deep learning methods have demonstrated promising coding performance by utilizing rich semantic embeddings optimized for…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Sha Guo , Zhuo Chen , Yang Zhao , Ning Zhang , Xiaotong Li , Lingyu Duan

Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis; however, their iterative denoising process demands substantial computational resources. In this paper, we present a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Xinle Cheng , Zhuoming Chen , Zhihao Jia

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word,…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Lun Huang , Wenmin Wang , Yaxian Xia , Jie Chen

Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Ahmed Ghorbel , Wassim Hamidouche , Luce Morin

Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a significant bottleneck for current tokenizers, which…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Haotian Ye , Qiyuan He , Jiaqi Han , Puheng Li , Jiaojiao Fan , Zekun Hao , Fitsum Reda , Yogesh Balaji , Huayu Chen , Sheng Liu , Angela Yao , James Zou , Stefano Ermon , Haoxiang Wang , Ming-Yu Liu

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Nianzu Yang , Pandeng Li , Liming Zhao , Yang Li , Chen-Wei Xie , Yehui Tang , Xudong Lu , Zhihang Liu , Yun Zheng , Yu Liu , Junchi Yan

Image decomposition offers deep insights into the imaging factors of visual data and significantly enhances various advanced computer vision tasks. In this work, we introduce a novel approach to low-light image enhancement based on…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Xingxing Yang , Jie Chen , Zaifeng Yang

Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Riccardo Del Chiaro , Bartłomiej Twardowski , Andrew D. Bagdanov , Joost van de Weijer

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in…

Machine Learning · Computer Science 2023-07-06 Qiqi Zhou , Yichen Zhu
‹ Prev 1 2 3 10 Next ›