English
Related papers

Related papers: Deep Compression Autoencoder for Efficient High-Re…

200 papers

We present DC-AE 1.5, a new family of deep compression autoencoders for high-resolution diffusion models. Increasing the autoencoder's latent channel number is a highly effective approach for improving its reconstruction quality. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Junyu Chen , Dongyun Zou , Wenkun He , Junsong Chen , Enze Xie , Song Han , Han Cai

Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation, reducing the denoising resolution and improving efficiency. However, the power of AE has long been underexplored in terms of network…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Yushu Wu , Yanyu Li , Ivan Skorokhodov , Anil Kag , Willi Menapace , Sharath Girish , Aliaksandr Siarohin , Yanzhi Wang , Sergey Tulyakov

Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high…

Computer Vision and Pattern Recognition · Computer Science 2026-01-14 Dongxu Liu , Jiahui Zhu , Yuang Peng , Haomiao Tang , Yuwei Chen , Chunrui Han , Zheng Ge , Daxin Jiang , Mingxue Liao

We propose TC-AE, a ViT-based architecture for deep compression autoencoders. Existing methods commonly increase the channel number of latent representations to maintain reconstruction quality under high compression ratios. However, this…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Teng Li , Ziyuan Huang , Cong Chen , Yangfu Li , Yuanhuiyi Lyu , Dandan Zheng , Chunhua Shen , Jun Zhang

Reducing token count is crucial for efficient training and inference of latent diffusion models, especially at high resolution. A common strategy is to build high-compression image tokenizers with more channels per token. However, when…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xin Cai , Zhiyuan You , Zhoutong Zhang , Tianfan Xue

The growing volume of high-resolution Whole Slide Images in digital histopathology poses significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but often…

Image and Video Processing · Electrical Eng. & Systems 2025-03-17 Srikar Yellapragada , Alexandros Graikos , Kostas Triaridis , Zilinghan Li , Tarak Nath Nandi , Ravi K Madduri , Prateek Prasanna , Joel Saltz , Dimitris Samaras

Understanding the coordinated activity underlying brain computations requires large-scale, simultaneous recordings from distributed neuronal structures at a cellular-level resolution. One major hurdle to design high-bandwidth,…

Neural and Evolutionary Computing · Computer Science 2018-09-18 Tong Wu , Wenfeng Zhao , Edward Keefer , Zhi Yang

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 In Cho , Youngbeom Yoo , Subin Jeon , Seon Joo Kim

Recent breakthroughs in video autoencoders (Video AEs) have advanced video generation, but existing methods fail to efficiently model spatio-temporal redundancies in dynamics, resulting in suboptimal compression factors. This shortfall…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Huaize Liu , Wenzhang Sun , Qiyuan Zhang , Donglin Di , Biao Gong , Hao Li , Chen Wei , Changqing Zou

Steered-Mixtures-of-Experts (SMoE) models provide sparse, edge-aware representations, applicable to many use-cases in image processing. This includes denoising, super-resolution and compression of 2D- and higher dimensional pixel data.…

Image and Video Processing · Electrical Eng. & Systems 2022-07-26 Elvira Fleig , Jonas Geistert , Erik Bochinski , Rolf Jongebloed , Thomas Sikora

Diffusion autoencoders (DAEs) are typically formulated as a noise prediction model and trained with a linear-$\beta$ noise schedule that spends much of its sampling steps at high noise levels. Because high noise levels are associated with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Pramook Khungurn , Sukit Seripanitkarn , Phonphrm Thawatdamrongkit , Supasorn Suwajanakorn

Image compression has been investigated as a fundamental research topic for many decades. Recently, deep learning has achieved great success in many computer vision tasks, and is gradually being used in image compression. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2018-04-26 Zhengxue Cheng , Heming Sun , Masaru Takeuchi , Jiro Katto

Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Hangyu Liu , Jianyong Wang , Yutao Sun

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a new…

Machine Learning · Computer Science 2025-01-22 Seyedmorteza Sadat , Jakob Buhmann , Derek Bradley , Otmar Hilliges , Romann M. Weber

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on…

Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Boyang Zheng , Nanye Ma , Shengbang Tong , Saining Xie

Embracing the deep learning techniques for representation learning in clustering research has attracted broad attention in recent years, yielding a newly developed clustering paradigm, viz. the deep clustering (DC). Typically, the DC models…

Machine Learning · Computer Science 2022-01-17 Shuai Chang

Learning a robust video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation. Directly applying image VAEs to individual frames in isolation can result in temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Yazhou Xing , Yang Fei , Yingqing He , Jingye Chen , Jiaxin Xie , Xiaowei Chi , Qifeng Chen

We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression,…

‹ Prev 1 2 3 10 Next ›