English
Related papers

Related papers: Adaptive 1D Video Diffusion Autoencoder

200 papers

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high…

Computer Vision and Pattern Recognition · Computer Science 2026-01-14 Dongxu Liu , Jiahui Zhu , Yuang Peng , Haomiao Tang , Yuwei Chen , Chunrui Han , Zheng Ge , Daxin Jiang , Mingxue Liao

Existing video tokenizers typically use the traditional Variational Autoencoder (VAE) architecture for video compression and reconstruction. However, to achieve good performance, its training process often relies on complex multi-stage…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Nianzu Yang , Pandeng Li , Liming Zhao , Yang Li , Chen-Wei Xie , Yehui Tang , Xudong Lu , Zhihang Liu , Yun Zheng , Yu Liu , Junchi Yan

Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Liuhan Chen , Zongjian Li , Bin Lin , Bin Zhu , Qian Wang , Shenghai Yuan , Xing Zhou , Xinhua Cheng , Li Yuan

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 In Cho , Youngbeom Yoo , Subin Jeon , Seon Joo Kim

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a new…

Machine Learning · Computer Science 2025-01-22 Seyedmorteza Sadat , Jakob Buhmann , Derek Bradley , Otmar Hilliges , Romann M. Weber

Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations. For continuous Variational Autoencoders (VAEs), achieving higher compression rates is desirable; yet, the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Yubo Dong , Linchao Zhu

We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods,…

Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high-dimensionality and complexity of videos. Several recent diffusion-based methods have shown comparable performance by…

Computer Vision and Pattern Recognition · Computer Science 2024-04-05 Kihong Kim , Haneol Lee , Jihye Park , Seyeon Kim , Kwanghee Lee , Seungryong Kim , Jaejun Yoo

Video autoencoders compress videos into compact latent representations for efficient reconstruction, playing a vital role in enhancing the quality and efficiency of video generation. However, existing video autoencoders often entangle…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Cuifeng Shen , Lumin Xu , Xingguo Zhu , Gengdai Liu

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Ivan Skorokhodov , Sharath Girish , Benran Hu , Willi Menapace , Yanyu Li , Rameen Abdal , Sergey Tulyakov , Aliaksandr Siarohin

Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Sijie Zhao , Yong Zhang , Xiaodong Cun , Shaoshu Yang , Muyao Niu , Xiaoyu Li , Wenbo Hu , Ying Shan

Recent advances in Latent Video Diffusion Models (LVDMs) have revolutionized video generation by leveraging Video Variational Autoencoders (Video VAEs) to compress intricate video data into a compact latent space. However, as LVDM training…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yu Cheng , Fajie Yuan

Variational autoencoders (VAEs) typically encode images into a compact latent space, reducing computational cost but introducing an optimization dilemma: a higher-dimensional latent space improves reconstruction fidelity but often hampers…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Xunzhi Xiang , Xingye Tian , Guiyu Zhang , Yabo Chen , Shaofeng Zhang , Xuebo Wang , Xin Tao , Qi Fan

Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Pingyu Wu , Kai Zhu , Yu Liu , Liming Zhao , Wei Zhai , Yang Cao , Zheng-Jun Zha

Learning a robust video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation. Directly applying image VAEs to individual frames in isolation can result in temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Yazhou Xing , Yang Fei , Yingqing He , Jingye Chen , Jiaxin Xie , Xiaowei Chi , Qifeng Chen

Recent breakthroughs in video autoencoders (Video AEs) have advanced video generation, but existing methods fail to efficiently model spatio-temporal redundancies in dynamics, resulting in suboptimal compression factors. This shortfall…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Huaize Liu , Wenzhang Sun , Qiyuan Zhang , Donglin Di , Biao Gong , Hao Li , Chen Wei , Changqing Zou

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yian Zhao , Feng Wang , Qiushan Guo , Chang Liu , Xiangyang Ji , Jian Zhang , Jie Chen

We introduce DC-VideoGen, a post-training acceleration framework for efficient video generation. DC-VideoGen can be applied to any pre-trained video diffusion model, improving efficiency by adapting it to a deep compression latent space…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Junyu Chen , Wenkun He , Yuchao Gu , Yuyang Zhao , Jincheng Yu , Junsong Chen , Dongyun Zou , Yujun Lin , Zhekai Zhang , Muyang Li , Haocheng Xi , Ligeng Zhu , Enze Xie , Song Han , Han Cai

Generative modeling aims to generate new data samples that resemble a given dataset, with diffusion models recently becoming the most popular generative model. One of the main challenges of diffusion models is solving the problem in the…

Numerical Analysis · Mathematics 2025-10-08 Wonjun Lee , Riley C. W. O'Neill , Dongmian Zou , Jeff Calder , Gilad Lerman
‹ Prev 1 2 3 10 Next ›