Related papers: Vector Quantized Diffusion Model for Text-to-Image…

Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation

The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) with autoregressive models as generation part has yielded high-quality results on image generation. However, the autoregressive models will strictly follow the progressive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-01 Minghui Hu , Yujie Wang , Tat-Jen Cham , Jianfei Yang , P. N. Suganthan

Improved Vector Quantized Diffusion Models

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to…

Computer Vision and Pattern Recognition · Computer Science 2023-02-09 Zhicong Tang , Shuyang Gu , Jianmin Bao , Dong Chen , Fang Wen

Progressive Text-to-Image Generation

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Zhengcong Fei , Mingyuan Fan , Li Zhu , Junshi Huang

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Vage Egiazarian , Denis Kuznedelev , Anton Voronov , Ruslan Svirschevski , Michael Goin , Daniil Pavlov , Dan Alistarh , Dmitry Baranchuk

Diffusion bridges vector quantized Variational AutoEncoders

Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior…

Machine Learning · Statistics 2022-08-04 Max Cohen , Guillaume Quispe , Sylvain Le Corff , Charles Ollion , Eric Moulines

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Generating sound effects that humans want is an important topic. However, there are few studies in this area for sound generation. In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound…

Sound · Computer Science 2023-05-01 Dongchao Yang , Jianwei Yu , Helin Wang , Wen Wang , Chao Weng , Yuexian Zou , Dong Yu

Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion

By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion…

Machine Learning · Computer Science 2025-04-02 Bac Nguyen , Chieh-Hsin Lai , Yuhta Takida , Naoki Murata , Toshimitsu Uesaka , Stefano Ermon , Yuki Mitsufuji

Hybrid Quantum-Classical Latent Diffusion Models for Medical Image Generation

Generative learning models in medical research are crucial in developing training data for deep learning models and advancing diagnostic tools, but the problem of high-quality, diverse images is an open topic of research. Quantum-enhanced…

Quantum Physics · Physics 2025-08-14 Kübra Yeter-Aydeniz , Nora M. Bauer , Pranay Jain , Max Masnick

Vector Quantized Image-to-Image Translation

Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Yu-Jie Chen , Shin-I Cheng , Wei-Chen Chiu , Hung-Yu Tseng , Hsin-Ying Lee

High-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion Decoding

Recent large-scale vision-language models (VLMs) have shown remarkable text-to-image generation capabilities, yet their visual fidelity remains constrained by the discrete image tokenization, which poses a major challenge. Although several…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Ji Woo Hong , Hee Suk Yoon , Gwanhyeong Koo , Eunseop Yoon , SooHwan Eom , Qi Dai , Chong Luo , Chang D. Yoo

CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model

There has been a significant progress in text conditional image generation models. Recent advancements in this field depend not only on improvements in model structures, but also vast quantities of text-image paired datasets. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Seungdae Han , Joohee Kim

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based…

Computer Vision and Pattern Recognition · Computer Science 2025-07-10 Zichong Meng , Yiming Xie , Xiaogang Peng , Zeyu Han , Huaizu Jiang

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jie Shi , Chenfei Wu , Jian Liang , Xiang Liu , Nan Duan

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

Autoregressive Image Generation using Residual Quantization

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Doyup Lee , Chiheon Kim , Saehoon Kim , Minsu Cho , Wook-Shin Han

SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation

Generating high-quality Scalable Vector Graphics (SVGs) from text remains a significant challenge. Existing LLM-based models that generate SVG code as a flat token sequence struggle with poor structural understanding and error accumulation,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Ximing Xing , Juncheng Hu , Ziteng Xue , Jing Zhang , Buyu Li , Sheng Wang , Dong Xu , Qian Yu

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Mengqi Huang , Zhendong Mao , Zhuowei Chen , Yongdong Zhang

Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Objective: While recent advances in text-conditioned generative models have enabled the synthesis of realistic medical images, progress has been largely confined to 2D modalities such as chest X-rays. Extending text-to-image generation to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Daniele Molino , Camillo Maria Caruso , Filippo Ruffini , Paolo Soda , Valerio Guarrasi

Image and Video Quality Assessment using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization

The design of image and video quality assessment (QA) algorithms is extremely important to benchmark and calibrate user experience in modern visual systems. A major drawback of the state-of-the-art QA methods is their limited ability to…

Image and Video Processing · Electrical Eng. & Systems 2025-12-30 Shankhanil Mitra , Diptanu De , Shika Rao , Rajiv Soundararajan

Text-driven Visual Synthesis with Latent Diffusion Prior

There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a…

Computer Vision and Pattern Recognition · Computer Science 2023-04-05 Ting-Hsuan Liao , Songwei Ge , Yiran Xu , Yao-Chih Lee , Badour AlBahar , Jia-Bin Huang