English
Related papers

Related papers: Vector Quantized Diffusion Model for Text-to-Image…

200 papers

The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) with autoregressive models as generation part has yielded high-quality results on image generation. However, the autoregressive models will strictly follow the progressive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-01 Minghui Hu , Yujie Wang , Tat-Jen Cham , Jianfei Yang , P. N. Suganthan

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to…

Computer Vision and Pattern Recognition · Computer Science 2023-02-09 Zhicong Tang , Shuyang Gu , Jianmin Bao , Dong Chen , Fang Wen

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Zhengcong Fei , Mingyuan Fan , Li Zhu , Junshi Huang

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Vage Egiazarian , Denis Kuznedelev , Anton Voronov , Ruslan Svirschevski , Michael Goin , Daniil Pavlov , Dan Alistarh , Dmitry Baranchuk

Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior…

Machine Learning · Statistics 2022-08-04 Max Cohen , Guillaume Quispe , Sylvain Le Corff , Charles Ollion , Eric Moulines

Generating sound effects that humans want is an important topic. However, there are few studies in this area for sound generation. In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound…

Sound · Computer Science 2023-05-01 Dongchao Yang , Jianwei Yu , Helin Wang , Wen Wang , Chao Weng , Yuexian Zou , Dong Yu

By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion…

Machine Learning · Computer Science 2025-04-02 Bac Nguyen , Chieh-Hsin Lai , Yuhta Takida , Naoki Murata , Toshimitsu Uesaka , Stefano Ermon , Yuki Mitsufuji

Generative learning models in medical research are crucial in developing training data for deep learning models and advancing diagnostic tools, but the problem of high-quality, diverse images is an open topic of research. Quantum-enhanced…

Quantum Physics · Physics 2025-08-14 Kübra Yeter-Aydeniz , Nora M. Bauer , Pranay Jain , Max Masnick

Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Yu-Jie Chen , Shin-I Cheng , Wei-Chen Chiu , Hung-Yu Tseng , Hsin-Ying Lee

Recent large-scale vision-language models (VLMs) have shown remarkable text-to-image generation capabilities, yet their visual fidelity remains constrained by the discrete image tokenization, which poses a major challenge. Although several…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Ji Woo Hong , Hee Suk Yoon , Gwanhyeong Koo , Eunseop Yoon , SooHwan Eom , Qi Dai , Chong Luo , Chang D. Yoo

There has been a significant progress in text conditional image generation models. Recent advancements in this field depend not only on improvements in model structures, but also vast quantities of text-image paired datasets. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Seungdae Han , Joohee Kim

Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based…

Computer Vision and Pattern Recognition · Computer Science 2025-07-10 Zichong Meng , Yiming Xie , Xiaogang Peng , Zeyu Han , Huaizu Jiang

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jie Shi , Chenfei Wu , Jian Liang , Xiang Liu , Nan Duan

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Doyup Lee , Chiheon Kim , Saehoon Kim , Minsu Cho , Wook-Shin Han

Generating high-quality Scalable Vector Graphics (SVGs) from text remains a significant challenge. Existing LLM-based models that generate SVG code as a flat token sequence struggle with poor structural understanding and error accumulation,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Ximing Xing , Juncheng Hu , Ziteng Xue , Jing Zhang , Buyu Li , Sheng Wang , Dong Xu , Qian Yu

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they…

Computer Vision and Pattern Recognition · Computer Science 2023-05-22 Mengqi Huang , Zhendong Mao , Zhuowei Chen , Yongdong Zhang

Objective: While recent advances in text-conditioned generative models have enabled the synthesis of realistic medical images, progress has been largely confined to 2D modalities such as chest X-rays. Extending text-to-image generation to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Daniele Molino , Camillo Maria Caruso , Filippo Ruffini , Paolo Soda , Valerio Guarrasi

The design of image and video quality assessment (QA) algorithms is extremely important to benchmark and calibrate user experience in modern visual systems. A major drawback of the state-of-the-art QA methods is their limited ability to…

Image and Video Processing · Electrical Eng. & Systems 2025-12-30 Shankhanil Mitra , Diptanu De , Shika Rao , Rajiv Soundararajan

There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a…

Computer Vision and Pattern Recognition · Computer Science 2023-04-05 Ting-Hsuan Liao , Songwei Ge , Yiran Xu , Yao-Chih Lee , Badour AlBahar , Jia-Bin Huang
‹ Prev 1 2 3 10 Next ›