Related papers: Channel-wise Vector Quantization

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

2D Gaussians Meet Visual Tokenizer

The image tokenizer is a critical component in AR image generation, as it determines how rich and structured visual content is encoded into compact representations. Existing quantization-based tokenizers such as VQ-GAN primarily focus on…

Computer Vision and Pattern Recognition · Computer Science 2025-08-21 Yiang Shi , Xiaoyang Guo , Wei Yin , Mingkai Jia , Qian Zhang , Xiaolin Hu , Wenyu Liu , Xinggang Wang

Channel-Aware Vector Quantization for Robust Semantic Communication on Discrete Channels

Deep learning-based semantic communication has largely relied on analog or semi-digital transmission, which limits compatibility with modern digital communication infrastructures. Recent studies have employed vector quantization (VQ) to…

Signal Processing · Electrical Eng. & Systems 2025-10-22 Zian Meng , Qiang Li , Wenqian Tang , Mingdie Yan , Xiaohu Ge

Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization

Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech…

Machine Learning · Computer Science 2023-10-05 Tanmay Gautam , Reid Pryzant , Ziyi Yang , Chenguang Zhu , Somayeh Sojoudi

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental models that compress continuous visual data into discrete tokens. Existing methods have tried to improve the quantization strategy for better reconstruction quality,…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Mingkai Jia , Wei Yin , Xiaotao Hu , Jiaxin Guo , Xiaoyang Guo , Qian Zhang , Xiao-Xiao Long , Ping Tan

LG-VQ: Language-Guided Codebook Learning

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner.…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Guotao Liang , Baoquan Zhang , Yaowei Wang , Xutao Li , Yunming Ye , Huaibin Wang , Chuyao Luo , Kola Ye , linfeng Luo

Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution

Vector-quantized based models have recently demonstrated strong potential for visual prior modeling. However, existing VQ-based methods simply encode visual features with nearest codebook items and train index predictor with code-level…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Qifan Li , Jiale Zou , Jinhua Zhang , Wei Long , Xingyu Zhou , Shuhang Gu

SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook

Vector quantization (VQ) is a method for deterministically learning features through discrete codebook representations. Recent works have utilized visual tokenizers to discretize visual regions for self-supervised representation learning.…

Computer Vision and Pattern Recognition · Computer Science 2024-09-11 Chenjing Ding , Chiyu Wang , Boshi Liu , Xi Guo , Weixuan Tang , Wei Wu

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Mengqi Huang , Zhendong Mao , Zhuowei Chen , Yongdong Zhang

VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation

Autoregressive (AR) models have recently shown strong performance in image generation, where a critical component is the visual tokenizer (VT) that maps continuous pixel inputs to discrete token sequences. The quality of the VT largely…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Huawei Lin , Tong Geng , Zhaozhuo Xu , Weijie Zhao

Autoregressive Image Generation using Residual Quantization

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Doyup Lee , Chiheon Kim , Saehoon Kim , Minsu Cho , Wook-Shin Han

Character-Centric Story Visualization via Visual Planning and Token Alignment

Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story. This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Hong Chen , Rujun Han , Te-Lin Wu , Hideki Nakayama , Nanyun Peng

VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over…

Networking and Internet Architecture · Computer Science 2024-09-06 Yongyi Miao , Zhongdang Li , Yang Wang , Die Hu , Jun Yan , Youfang Wang

Factorized Visual Tokenization and Generation

Visual tokenizers are fundamental to image generation. They convert visual data into discrete tokens, enabling transformer-based models to excel at image generation. Despite their success, VQ-based tokenizers like VQGAN face significant…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Zechen Bai , Jianxiong Gao , Ziteng Gao , Pichao Wang , Zheng Zhang , Tong He , Mike Zheng Shou

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping…

Machine Learning · Computer Science 2026-05-27 Mohammad Hassan Vali , Tom Bäckström , Arno Solin

Progressive Text-to-Image Generation

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Zhengcong Fei , Mingyuan Fan , Li Zhu , Junshi Huang

Channel-Level Variable Quantization Network for Deep Image Compression

Deep image compression systems mainly contain four components: encoder, quantizer, entropy model, and decoder. To optimize these four components, a joint rate-distortion framework was proposed, and many deep neural network-based methods…

Image and Video Processing · Electrical Eng. & Systems 2020-07-27 Zhisheng Zhong , Hiroaki Akutsu , Kiyoharu Aizawa

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

Image tokenizers play a critical role in shaping the performance of subsequent generative models. Since the introduction of VQ-GAN, discrete image tokenization has undergone remarkable advancements. Improvements in architecture,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Xiang Li , Kai Qiu , Hao Chen , Jason Kuen , Jiuxiang Gu , Jindong Wang , Zhe Lin , Bhiksha Raj

Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data

In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Tim Elsner , Paula Usinger , Victor Czech , Gregor Kobsik , Yanjiang He , Isaak Lim , Leif Kobbelt

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical…

Machine Learning · Computer Science 2024-03-29 Yuhta Takida , Yukara Ikemiya , Takashi Shibuya , Kazuki Shimada , Woosung Choi , Chieh-Hsin Lai , Naoki Murata , Toshimitsu Uesaka , Kengo Uchida , Wei-Hsiang Liao , Yuki Mitsufuji