Related papers: Spectral Image Tokenizer

Wavelet-Based Image Tokenizer for Vision Transformers

Non-overlapping patch-wise convolution is the default image tokenizer for all state-of-the-art vision Transformer (ViT) models. Even though many ViT variants have been proposed to improve its efficiency and accuracy, little research on…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Zhenhai Zhu , Radu Soricut

ImageFolder: Autoregressive Image Generation with Folded Tokens

Image tokenizers are crucial for visual generative models, e.g., diffusion models (DMs) and autoregressive (AR) models, as they construct the latent representation for modeling. Increasing token length is a common approach to improve the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Xiang Li , Kai Qiu , Hao Chen , Jason Kuen , Jiuxiang Gu , Bhiksha Raj , Zhe Lin

SFTok: Bridging the Performance Gap in Discrete Tokenizers

Recent advances in multimodal models highlight the pivotal role of image tokenization in high-resolution image generation. By compressing images into compact latent representations, tokenizers enable generative models to operate in…

Computer Vision and Pattern Recognition · Computer Science 2025-12-19 Qihang Rao , Borui Zhang , Wenzhao Zheng , Jie Zhou , Jiwen Lu

Differentiable Hierarchical Visual Tokenization

Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while…

Computer Vision and Pattern Recognition · Computer Science 2025-11-05 Marius Aasan , Martine Hjelkrem-Tan , Nico Catalano , Changkyu Choi , Adín Ramírez Rivera

TokenLight: Precise Lighting Control in Images using Attribute Tokens

This paper presents a method for image relighting that enables precise and continuous control over multiple illumination attributes in a photograph. We formulate relighting as a conditional image generation task and introduce attribute…

Computer Vision and Pattern Recognition · Computer Science 2026-04-20 Sumit Chaturvedi , Yannick Hold-Geoffroy , Mengwei Ren , Jingyuan Liu , He Zhang , Yiqun Mei , Julie Dorsey , Zhixin Shu

Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution

Discrete Wavelet Transform (DWT) has been widely explored to enhance the performance of image superresolution (SR). Despite some DWT-based methods improving SR by capturing fine-grained frequency signals, most existing approaches neglect…

Computer Vision and Pattern Recognition · Computer Science 2025-11-05 Peng Du , Hui Li , Han Xu , Paul Barom Jeon , Dongwook Lee , Daehyun Ji , Ran Yang , Feng Zhu

WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

Image inpainting, which refers to the synthesis of missing regions in an image, can help restore occluded or degraded areas and also serve as a precursor task for self-supervision. The current state-of-the-art models for image inpainting…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Pranav Jeevan , Dharshan Sampath Kumar , Amit Sethi

Tokenize Image as a Set

This paper proposes a fundamentally new paradigm for image generation through set-based tokenization and distribution modeling. Unlike conventional methods that serialize images into fixed-position latent codes with a uniform compression…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Zigang Geng , Mengde Xu , Han Hu , Shuyang Gu

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Yulin Wang , Rui Huang , Shiji Song , Zeyi Huang , Gao Huang

Wavelets Are All You Need for Autoregressive Image Generation

In this paper, we take a new approach to autoregressive image generation that is based on two main ingredients. The first is wavelet image coding, which allows to tokenize the visual details of an image from coarse to fine details by…

Machine Learning · Computer Science 2025-08-28 Wael Mattar , Idan Levy , Nir Sharon , Shai Dekel

An Image is Worth 32 Tokens for Reconstruction and Generation

Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Qihang Yu , Mark Weber , Xueqing Deng , Xiaohui Shen , Daniel Cremers , Liang-Chieh Chen

Learning Token-based Representation for Image Retrieval

In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features…

Image and Video Processing · Electrical Eng. & Systems 2021-12-14 Hui Wu , Min Wang , Wengang Zhou , Yang Hu , Houqiang Li

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

Autoregressive image modeling relies on visual tokenizers to compress images into compact latent representations. We design an end-to-end training pipeline that jointly optimizes reconstruction and generation, enabling direct supervision…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Wenda Chu , Bingliang Zhang , Jiaqi Han , Yizhuo Li , Linjie Yang , Yisong Yue , Qiushan Guo

Communication-Inspired Tokenization for Structured Image Representations

Discrete image tokenizers have emerged as a key component of modern vision and multimodal systems, providing a sequential interface for transformer-based architectures. However, most existing approaches remain primarily optimized for…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Aram Davtyan , Yusuf Sahin , Yasaman Haghighi , Sebastian Stapf , Pablo Acuaviva , Alexandre Alahi , Paolo Favaro

Text-Guided Token Communication for Wireless Image Transmission

With the emergence of 6G networks and proliferation of visual applications, efficient image transmission under adverse channel conditions is critical. We present a text-guided token communication system leveraging pre-trained foundation…

Information Theory · Computer Science 2025-07-09 Bole Liu , Li Qiao , Ye Wang , Zhen Gao , Yu Ma , Keke Ying , Tong Qin

Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation

Image tokenization plays a critical role in reducing the computational demands of modeling high-resolution images, significantly improving the efficiency of image and multimodal understanding and generation. Recent advances in 1D latent…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Ze Wang , Hao Chen , Benran Hu , Jiang Liu , Ximeng Sun , Jialian Wu , Yusheng Su , Xiaodong Yu , Emad Barsoum , Zicheng Liu

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Autoregressive visual generation models typically rely on tokenizers to compress images into tokens that can be predicted sequentially. A fundamental dilemma exists in token representation: discrete tokens enable straightforward modeling…

Computer Vision and Pattern Recognition · Computer Science 2025-09-01 Yuqing Wang , Zhijie Lin , Yao Teng , Yuanzhi Zhu , Shuhuai Ren , Jiashi Feng , Xihui Liu

Spectral Vision Transformer for Efficient Tokenization with Limited Data

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Alexandra G. Roberts , Maneesh John , Jinwei Zhang , Dominick Romano , Mert Sisman , Ki Sueng Choi , Heejong Kim , Mert R. Sabuncu , Thanh D. Nguyen , Alexey V. Dimov , Pascal Spincemaille , Brian H. Kopell , Yi Wang

CAT: Content-Adaptive Image Tokenization

Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity. To address this, we introduce Content-Adaptive Tokenizer (CAT), which dynamically adjusts…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Junhong Shen , Kushal Tirumala , Michihiro Yasunaga , Ishan Misra , Luke Zettlemoyer , Lili Yu , Chunting Zhou

Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation

Visual generative models based on latent space have achieved great success, underscoring the significance of visual tokenization. Mapping images to latents boosts efficiency and enables multimodal alignment for scaling up in downstream…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Yunpeng Qu , Kaidong Zhang , Yukang Ding , Ying Chen , Jian Wang