Related papers: Adaptive 1D Video Diffusion Autoencoder

Latent-Compressed Variational Autoencoder for Video Diffusion Models

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high…

Computer Vision and Pattern Recognition · Computer Science 2026-01-14 Dongxu Liu , Jiahui Zhu , Yuang Peng , Haomiao Tang , Yuwei Chen , Chunrui Han , Zheng Ge , Daxin Jiang , Mingxue Liao

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Nianzu Yang , Pandeng Li , Liming Zhao , Yang Li , Chen-Wei Xie , Yehui Tang , Xudong Lu , Zhihang Liu , Yun Zheng , Yu Liu , Junchi Yan

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Liuhan Chen , Zongjian Li , Bin Lin , Bin Zhu , Qian Wang , Shenghai Yuan , Xing Zhou , Xinhua Cheng , Li Yuan

Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 In Cho , Youngbeom Yoo , Subin Jeon , Seon Joo Kim

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a new…

Machine Learning · Computer Science 2025-01-22 Seyedmorteza Sadat , Jakob Buhmann , Derek Bradley , Otmar Hilliges , Romann M. Weber

MTC-VAE: Multi-Level Temporal Compression with Content Awareness

Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations. For continuous Variational Autoencoders (VAEs), achieving higher compression rates is desirable; yet, the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Yubo Dong , Linchao Zhu

LTX-Video: Realtime Video Latent Diffusion

We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods,…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Yoav HaCohen , Nisan Chiprut , Benny Brazowski , Daniel Shalem , Dudu Moshe , Eitan Richardson , Eran Levin , Guy Shiran , Nir Zabari , Ori Gordon , Poriya Panet , Sapir Weissbuch , Victor Kulikov , Yaki Bitterman , Zeev Melumian , Ofir Bibi

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high-dimensionality and complexity of videos. Several recent diffusion-based methods have shown comparable performance by…

Computer Vision and Pattern Recognition · Computer Science 2024-04-05 Kihong Kim , Haneol Lee , Jihye Park , Seyeon Kim , Kwanghee Lee , Seungryong Kim , Jaejun Yoo

Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context

Video autoencoders compress videos into compact latent representations for efficient reconstruction, playing a vital role in enhancing the quality and efficiency of video generation. However, existing video autoencoders often entangle…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Cuifeng Shen , Lumin Xu , Xingguo Zhu , Gengdai Liu

Improving the Diffusability of Autoencoders

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Ivan Skorokhodov , Sharath Girish , Benran Hu , Willi Menapace , Yanyu Li , Rameen Abdal , Sergey Tulyakov , Aliaksandr Siarohin

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Sijie Zhao , Yong Zhang , Xiaodong Cun , Shaoshu Yang , Muyao Niu , Xiaoyu Li , Wenbo Hu , Ying Shan

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models

Recent advances in Latent Video Diffusion Models (LVDMs) have revolutionized video generation by leveraging Video Variational Autoencoders (Video VAEs) to compress intricate video data into a compact latent space. However, as LVDM training…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yu Cheng , Fajie Yuan

Denoising Vision Transformer Autoencoder with Spectral Self-Regularization

Variational autoencoders (VAEs) typically encode images into a compact latent space, reducing computational cost but introducing an optimization dilemma: a higher-dimensional latent space improves reconstruction fidelity but often hampers…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Xunzhi Xiang , Xingye Tian , Guiyu Zhang , Yabo Chen , Shaofeng Zhang , Xuebo Wang , Xin Tao , Qi Fan

Improved Video VAE for Latent Video Diffusion Model

Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Pingyu Wu , Kai Zhu , Yu Liu , Liming Zhao , Wei Zhai , Yang Cao , Zheng-Jun Zha

Large Motion Video Autoencoding with Cross-modal Video VAE

Learning a robust video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation. Directly applying image VAEs to individual frames in isolation can result in temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Yazhou Xing , Yang Fei , Yingqing He , Jingye Chen , Jiaxin Xie , Xiaowei Chi , Qifeng Chen

Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion

Recent breakthroughs in video autoencoders (Video AEs) have advanced video generation, but existing methods fail to efficiently model spatio-temporal redundancies in dynamics, resulting in suboptimal compression factors. This shortfall…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Huaize Liu , Wenzhang Sun , Qiyuan Zhang , Donglin Di , Biao Gong , Hao Li , Chen Wei , Changqing Zou

Video Generation with Predictive Latents

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yian Zhao , Feng Wang , Qiushan Guo , Chang Liu , Xiangyang Ji , Jian Zhang , Jie Chen

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Junyu Chen , Wenkun He , Yuchao Gu , Yuyang Zhao , Jincheng Yu , Junsong Chen , Dongyun Zou , Yujun Lin , Zhekai Zhang , Muyang Li , Haocheng Xi , Ligeng Zhu , Enze Xie , Song Han , Han Cai

Geometry-Preserving Encoder/Decoder in Latent Generative Models

Generative modeling aims to generate new data samples that resemble a given dataset, with diffusion models recently becoming the most popular generative model. One of the main challenges of diffusion models is solving the problem in the…

Numerical Analysis · Mathematics 2025-10-08 Wonjun Lee , Riley C. W. O'Neill , Dongmian Zou , Jeff Calder , Gilad Lerman