Related papers: Deep Compression Autoencoder for Efficient High-Re…

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

We present DC-AE 1.5, a new family of deep compression autoencoders for high-resolution diffusion models. Increasing the autoencoder's latent channel number is a highly effective approach for improving its reconstruction quality. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Junyu Chen , Dongyun Zou , Wenkun He , Junsong Chen , Enze Xie , Song Han , Han Cai

H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models

Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation, reducing the denoising resolution and improving efficiency. However, the power of AE has long been underexplored in terms of network…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Yushu Wu , Yanyu Li , Ivan Skorokhodov , Anil Kag , Willi Menapace , Sharath Girish , Aliaksandr Siarohin , Yanzhi Wang , Sergey Tulyakov

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high…

Computer Vision and Pattern Recognition · Computer Science 2026-01-14 Dongxu Liu , Jiahui Zhu , Yuang Peng , Haomiao Tang , Yuwei Chen , Chunrui Han , Zheng Ge , Daxin Jiang , Mingxue Liao

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

We propose TC-AE, a ViT-based architecture for deep compression autoencoders. Existing methods commonly increase the channel number of latent representations to maintain reconstruction quality under high compression ratios. However, this…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Teng Li , Ziyuan Huang , Cong Chen , Yangfu Li , Yuanhuiyi Lyu , Dandan Zheng , Chunhua Shen , Jun Zhang

DA-VAE: Plug-in Latent Compression for Diffusion via Detail Alignment

Reducing token count is crucial for efficient training and inference of latent diffusion models, especially at high resolution. A common strategy is to build high-compression image tokenizers with more channels per token. However, when…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xin Cai , Zhiyuan You , Zhoutong Zhang , Tianfan Xue

Pathology Image Compression with Pre-trained Autoencoders

The growing volume of high-resolution Whole Slide Images in digital histopathology poses significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but often…

Image and Video Processing · Electrical Eng. & Systems 2025-03-17 Srikar Yellapragada , Alexandros Graikos , Kostas Triaridis , Zilinghan Li , Tarak Nath Nandi , Ravi K Madduri , Prateek Prasanna , Joel Saltz , Dimitris Samaras

Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording

Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth,…

Neural and Evolutionary Computing · Computer Science 2018-09-18 Tong Wu , Wenfeng Zhao , Edward Keefer , Zhi Yang

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 In Cho , Youngbeom Yoo , Subin Jeon , Seon Joo Kim

Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion

Recent breakthroughs in video autoencoders (Video AEs) have advanced video generation, but existing methods fail to efficiently model spatio-temporal redundancies in dynamics, resulting in suboptimal compression factors. This shortfall…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Huaize Liu , Wenzhang Sun , Qiyuan Zhang , Donglin Di , Biao Gong , Hao Li , Chen Wei , Changqing Zou

Edge-Aware Autoencoder Design for Real-Time Mixture-of-Experts Image Compression

Steered-Mixtures-of-Experts (SMoE) models provide sparse, edge-aware representations, applicable to many use-cases in image processing. This includes denoising, super-resolution and compression of 2D- and higher dimensional pixel data.…

Image and Video Processing · Electrical Eng. & Systems 2022-07-26 Elvira Fleig , Jonas Geistert , Erik Bochinski , Rolf Jongebloed , Thomas Sikora

Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality

Diffusion autoencoders (DAEs) are typically formulated as a noise prediction model and trained with a linear-$\beta$ noise schedule that spends much of its sampling steps at high noise levels. Because high noise levels are associated with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Pramook Khungurn , Sukit Seripanitkarn , Phonphrm Thawatdamrongkit , Supasorn Suwajanakorn

Deep Convolutional AutoEncoder-based Lossy Image Compression

Image compression has been investigated as a fundamental research topic for many decades. Recently, deep learning has achieved great success in many computer vision tasks, and is gradually being used in image compression. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2018-04-26 Zhengxue Cheng , Heming Sun , Masaru Takeuchi , Jiro Katto

Geometric Autoencoder for Diffusion Models

Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Hangyu Liu , Jianyong Wang , Yutao Sun

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a new…

Machine Learning · Computer Science 2025-01-22 Seyedmorteza Sadat , Jakob Buhmann , Derek Bradley , Otmar Hilliges , Romann M. Weber

Latent-Compressed Variational Autoencoder for Video Diffusion Models

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution

Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Sebastian Cajas , Ashaba Judith , Rahul Gorijavolu , Sahil Kapadia , Hillary Clinton Kasimbazi , Leo Kinyera , Emmanuel Paul Kwesiga , Sri Sri Jaithra Varma Manthena , Luis Filipe Nakayama , Ninsiima Doreen , Leo Anthony Celi

Diffusion Transformers with Representation Autoencoders

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Boyang Zheng , Nanye Ma , Shengbang Tong , Saining Xie

Deep clustering with fusion autoencoder

Embracing the deep learning techniques for representation learning in clustering research has attracted broad attention in recent years, yielding a newly developed clustering paradigm, viz. the deep clustering (DC). Typically, the DC models…

Machine Learning · Computer Science 2022-01-17 Shuai Chang

Large Motion Video Autoencoding with Cross-modal Video VAE

Learning a robust video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation. Directly applying image VAEs to individual frames in isolation can result in temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Yazhou Xing , Yang Fei , Yingqing He , Jingye Chen , Jiaxin Xie , Xiaowei Chi , Qifeng Chen

Qwen-Image-VAE-2.0 Technical Report

We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-14 Zekai Zhang , Deqing Li , Kuan Cao , Yujia Wu , Chenfei Wu , Yu Wu , Liang Peng , Hao Meng , Jiahao Li , Jie Zhang , Kaiyuan Gao , Kun Yan , Lihan Jiang , Ningyuan Tang , Shengming Yin , Tianhe Wu , Xiao Xu , Xiaoyue Chen , Yan Shu , Yanran Zhang , Yilei Chen , Yixian Xu , Yuxiang Chen , Zhendong Wang , Zihao Liu , Zikai Zhou , Yiliang Gu , Yi Wang , Xiaoxiao Xu , Lin Qu