English
Related papers

Related papers: Image and Video Tokenization with Binary Spherical…

200 papers

Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Jiangtao Wang , Zhen Qin , Yifan Zhang , Vincent Tao Hu , Björn Ommer , Rania Briq , Stefan Kesselheim

Hashing methods, which encode high-dimensional images with compact discrete codes, have been widely applied to enhance large-scale image retrieval. In this paper, we put forward Deep Spherical Quantization (DSQ), a novel method to make deep…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Sepehr Eghbali , Ladan Tahvildari

This paper proposes a novel matrix quantization method, Binary Quadratic Quantization (BQQ). In contrast to conventional first-order quantization approaches, such as uniform quantization and binary coding quantization, that approximate…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Kyo Kuroki , Yasuyuki Okoshi , Thiem Van Chu , Kazushi Kawamura , Masato Motomura

Existing vector quantization (VQ) methods struggle with scalability, largely attributed to the instability of the codebook that undergoes partial updates during training. The codebook is prone to collapse as utilization decreases, due to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Fengyuan Shi , Zhuoyan Luo , Yixiao Ge , Yujiu Yang , Ying Shan , Limin Wang

Visual tokenizers are fundamental to image generation. They convert visual data into discrete tokens, enabling transformer-based models to excel at image generation. Despite their success, VQ-based tokenizers like VQGAN face significant…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Zechen Bai , Jianxiong Gao , Ziteng Gao , Pichao Wang , Zheng Zhang , Tong He , Mike Zheng Shou

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Vage Egiazarian , Denis Kuznedelev , Anton Voronov , Ruslan Svirschevski , Michael Goin , Daniil Pavlov , Dan Alistarh , Dmitry Baranchuk

How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Minjun Kim , Jaeri Lee , Jongjin Kim , Jeongin Yun , Yongmo Kwon , U Kang

Despite significant advancements in human motion generation, current motion representations, typically formulated as discrete frame sequences, still face two critical limitations: (i) they fail to capture motion from a multi-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Zan Wang , Jingze Zhang , Yixin Chen , Baoxiong Jia , Wei Liang , Siyuan Huang

The image tokenizer is a critical component in AR image generation, as it determines how rich and structured visual content is encoded into compact representations. Existing quantization-based tokenizers such as VQ-GAN primarily focus on…

Computer Vision and Pattern Recognition · Computer Science 2025-08-21 Yiang Shi , Xiaoyang Guo , Wei Yin , Mingkai Jia , Qian Zhang , Xiaolin Hu , Wenyu Liu , Xinggang Wang

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Wei Song , Tianhang Wang , Yitong Chen , Tong Zhang , Zuxuan Wu , Ming Li , Jiaqi Wang , Kaicheng Yu

Diffusion transformers have demonstrated remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Tianchen Zhao , Tongcheng Fang , Haofeng Huang , Enshu Liu , Rui Wan , Widyadewi Soedarmadji , Shiyao Li , Zinan Lin , Guohao Dai , Shengen Yan , Huazhong Yang , Xuefei Ning , Yu Wang

Image tokenizers play a critical role in shaping the performance of subsequent generative models. Since the introduction of VQ-GAN, discrete image tokenization has undergone remarkable advancements. Improvements in architecture,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Xiang Li , Kai Qiu , Hao Chen , Jason Kuen , Jiuxiang Gu , Jindong Wang , Zhe Lin , Bhiksha Raj

Vector Quantization (VQ) techniques face significant challenges in codebook utilization, limiting reconstruction fidelity in image modeling. We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning…

Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Yue Zhao , Hanwen Jiang , Zhenlin Xu , Chutong Yang , Ehsan Adeli , Philipp Krähenbühl

Semantic communications provide significant performance gains over traditional communications by transmitting task-relevant semantic features through wireless channels. However, most existing studies rely on end-to-end (E2E) training of…

Signal Processing · Electrical Eng. & Systems 2024-12-10 Joohyuk Park , Yongjeong Oh , Yongjune Kim , Yo-Seb Jeon

Vision Transformer (ViT)-based models have shown state-of-the-art performance (e.g., accuracy) in vision-based AI tasks. However, realizing their capability in resource-constrained embedded AI systems is challenging due to their inherent…

Neural and Evolutionary Computing · Computer Science 2026-01-06 Rachmad Vidya Wicaksana Putra , Saad Iftikhar , Muhammad Shafique

Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Zhikai Li , Junrui Xiao , Lianwei Yang , Qingyi Gu

We introduce a multi-scale Image Super Resolution (ISR) method building on recent advances in Visual Auto-Regressive (VAR) modeling. VAR models break image tokenization into additive, gradually increasing scales, using Residual Quantization…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Isma Hadji , Enrique Sanchez , Adrian Bulat , Brais Martinez , Georgios Tzimiropoulos

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

Video tokenizers are essential for latent video diffusion models, converting raw video data into spatiotemporally compressed latent spaces for efficient training. However, extending state-of-the-art video tokenizers to achieve a temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Aniruddha Mahapatra , Long Mai , David Bourgin , Yitian Zhang , Feng Liu
‹ Prev 1 2 3 10 Next ›