English
Related papers

Related papers: DGAE: Diffusion-Guided Autoencoder for Efficient L…

200 papers

Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Hangyu Liu , Jianyong Wang , Yutao Sun

Diffusion probabilistic models (DPMs) have shown remarkable results on various image synthesis tasks such as text-to-image generation and image inpainting. However, compared to other generative methods like VAEs and GANs, DPMs lack a…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Yipeng Leng , Qiangjuan Huang , Zhiyuan Wang , Yangyang Liu , Haoyu Zhang

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Boyang Zheng , Nanye Ma , Shengbang Tong , Saining Xie

We present DC-AE 1.5, a new family of deep compression autoencoders for high-resolution diffusion models. Increasing the autoencoder's latent channel number is a highly effective approach for improving its reconstruction quality. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Junyu Chen , Dongyun Zou , Wenkun He , Junsong Chen , Enze Xie , Song Han , Han Cai

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Junyu Chen , Han Cai , Junsong Chen , Enze Xie , Shang Yang , Haotian Tang , Muyang Li , Yao Lu , Song Han

Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation, reducing the denoising resolution and improving efficiency. However, the power of AE has long been underexplored in terms of network…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Yushu Wu , Yanyu Li , Ivan Skorokhodov , Anil Kag , Willi Menapace , Sharath Girish , Aliaksandr Siarohin , Yanzhi Wang , Sergey Tulyakov

Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Yuan Gao , Chen Chen , Tianrong Chen , Jiatao Gu

Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhaoyang Wang , Dongyang Li , Mingyang Zhang , Hao Luo , Maoguo Gong

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a new…

Machine Learning · Computer Science 2025-01-22 Seyedmorteza Sadat , Jakob Buhmann , Derek Bradley , Otmar Hilliges , Romann M. Weber

Diffusion models have become the dominant paradigm for image generation and editing, with latent diffusion models shifting denoising to a compact latent space for efficiency and scalability. Recent attempts to leverage pretrained visual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Yue Gong , Hongyu Li , Shanyuan Liu , Bo Cheng , Yuhang Ma , Liebucha Wu , Xiaoyu Wu , Manyuan Zhang , Dawei Leng , Yuhui Yin , Lijun Zhang

This study presents Latent Diffusion Autoencoder (LDAE), a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging, focusing on Alzheimer disease (AD) using brain MR from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Gabriele Lozupone , Alessandro Bria , Francesco Fontanella , Frederick J. A. Meijer , Claudio De Stefano , Henkjan Huisman

Diffusion models have attained impressive visual quality for image synthesis. However, how to interpret and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-26 Zeyu Lu , Chengyue Wu , Xinyuan Chen , Yaohui Wang , Lei Bai , Yu Qiao , Xihui Liu

Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Luan Thanh Trinh , Tomoki Hamagami

In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of…

Machine Learning · Computer Science 2023-10-31 Tianyang Hu , Fei Chen , Haonan Wang , Jiawei Li , Wenjia Wang , Jiacheng Sun , Zhenguo Li

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 In Cho , Youngbeom Yoo , Subin Jeon , Seon Joo Kim

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Nianzu Yang , Pandeng Li , Liming Zhao , Yang Li , Chen-Wei Xie , Yehui Tang , Xudong Lu , Zhihang Liu , Yun Zheng , Yu Liu , Junchi Yan

Generative modeling aims to generate new data samples that resemble a given dataset, with diffusion models recently becoming the most popular generative model. One of the main challenges of diffusion models is solving the problem in the…

Numerical Analysis · Mathematics 2025-10-08 Wonjun Lee , Riley C. W. O'Neill , Dongmian Zou , Jeff Calder , Gilad Lerman

Diffusion autoencoders (DAEs) are typically formulated as a noise prediction model and trained with a linear-$\beta$ noise schedule that spends much of its sampling steps at high noise levels. Because high noise levels are associated with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Pramook Khungurn , Sukit Seripanitkarn , Phonphrm Thawatdamrongkit , Supasorn Suwajanakorn

Reducing token count is crucial for efficient training and inference of latent diffusion models, especially at high resolution. A common strategy is to build high-compression image tokenizers with more channels per token. However, when…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xin Cai , Zhiyuan You , Zhoutong Zhang , Tianfan Xue
‹ Prev 1 2 3 10 Next ›