Related papers: Factorized Video Autoencoders for Efficient Genera…

Video Probabilistic Diffusion Models in Projected Latent Space

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sihyun Yu , Kihyuk Sohn , Subin Kim , Jinwoo Shin

Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction

Generative models have demonstrated strong performance in conditional settings and can be viewed as a form of data compression, where the condition serves as a compact representation. However, their limited controllability and…

Machine Learning · Computer Science 2025-07-04 Xiao Li , Liangji Zhu , Anand Rangarajan , Sanjay Ranka

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces

Video tokenizers are essential for latent video diffusion models, converting raw video data into spatiotemporally compressed latent spaces for efficient training. However, extending state-of-the-art video tokenizers to achieve a temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Aniruddha Mahapatra , Long Mai , David Bourgin , Yitian Zhang , Feng Liu

Towards Multi-Task Multi-Modal Models: A Video Generative Perspective

Advancements in language foundation models have primarily fueled the recent surge in artificial intelligence. In contrast, generative learning of non-textual modalities, especially videos, significantly trails behind language modeling. This…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Lijun Yu

Multi-modal Latent Diffusion

Multi-modal data-sets are ubiquitous in modern applications, and multi-modal Variational Autoencoders are a popular family of models that aim to learn a joint representation of the different modalities. However, existing approaches suffer…

Machine Learning · Computer Science 2023-12-19 Mustapha Bounoua , Giulio Franzese , Pietro Michiardi

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Andreas Blattmann , Robin Rombach , Huan Ling , Tim Dockhorn , Seung Wook Kim , Sanja Fidler , Karsten Kreis

Video Generation with Predictive Latents

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yian Zhao , Feng Wang , Qiushan Guo , Chang Liu , Xiangyang Ji , Jian Zhang , Jie Chen

Feedback Recurrent Autoencoder for Video Compression

Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video…

Machine Learning · Computer Science 2020-04-10 Adam Golinski , Reza Pourreza , Yang Yang , Guillaume Sautiere , Taco S Cohen

Factorized Deep Generative Models for Trajectory Generation with Spatiotemporal-Validity Constraints

Trajectory data generation is an important domain that characterizes the generative process of mobility data. Traditional methods heavily rely on predefined heuristics and distributions and are weak in learning unknown mechanisms. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2020-09-22 Liming Zhang , Liang Zhao , Dieter Pfoser

Conditional Generative Modeling via Learning the Latent Space

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings. We propose a novel general-purpose framework…

Machine Learning · Computer Science 2020-10-12 Sameera Ramasinghe , Kanchana Ranasinghe , Salman Khan , Nick Barnes , Stephen Gould

LaDDer: Latent Data Distribution Modelling with a Generative Prior

In this paper, we show that the performance of a learnt generative model is closely related to the model's ability to accurately represent the inferred \textbf{latent data distribution}, i.e. its topology and structural properties. We…

Computer Vision and Pattern Recognition · Computer Science 2020-09-02 Shuyu Lin , Ronald Clark

Multiscale Augmented Normalizing Flows for Image Compression

Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of…

Image and Video Processing · Electrical Eng. & Systems 2024-05-24 Marc Windsheimer , Fabian Brand , André Kaup

Latent-Compressed Variational Autoencoder for Video Diffusion Models

Video variational autoencoders (VAEs) used in latent diffusion models typically require a sufficiently large number of latent channels to ensure high-quality video reconstruction. However, recent studies have revealed that an excessive…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiarui Guan , Wenshuai Zhao , Zhengtao Zou , Juho Kannala , Arno Solin

Efficient training for future video generation based on hierarchical disentangled representation of latent variables

Generating videos predicting the future of a given sequence has been an area of active research in recent years. However, an essential problem remains unsolved: most of the methods require large computational cost and memory usage for…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Naoya Fushishita , Antonio Tejero-de-Pablos , Yusuke Mukuta , Tatsuya Harada

Flow Matching in Latent Space

Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods…

Computer Vision and Pattern Recognition · Computer Science 2023-07-18 Quan Dao , Hao Phung , Binh Nguyen , Anh Tran

LatentFM: A Latent Flow Matching Approach for Generative Medical Image Segmentation

Generative models have achieved remarkable progress with the emergence of flow matching (FM). It has demonstrated strong generative capabilities and attracted significant attention as a simulation-free flow-based framework capable of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Huynh Trinh Ngoc , Hoang Anh Nguyen Kim , Toan Nguyen Hai , Long Tran Quoc

Latent Generative Modeling of Random Fields from Limited Training Data

The ability to accurately model random fields plays a critical role in science and engineering for problems involving uncertain, spatially-varying quantities such as heterogeneous material properties and turbulent flows. Deep generative…

Machine Learning · Computer Science 2026-05-04 James E. Warner , Tristan A. Shah , Patrick E. Leser , Geoffrey F. Bomarito , Joshua D. Pribe , Michael C. Stanley

Improving the Diffusability of Autoencoders

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Ivan Skorokhodov , Sharath Girish , Benran Hu , Willi Menapace , Yanyu Li , Rameen Abdal , Sergey Tulyakov , Aliaksandr Siarohin

Exploring the Latent Space of Autoencoders with Interventional Assays

Autoencoders exhibit impressive abilities to embed the data manifold into a low-dimensional latent space, making them a staple of representation learning methods. However, without explicit supervision, which is often unavailable, the…

Machine Learning · Computer Science 2023-01-12 Felix Leeb , Stefan Bauer , Michel Besserve , Bernhard Schölkopf

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high-dimensionality and complexity of videos. Several recent diffusion-based methods have shown comparable performance by…

Computer Vision and Pattern Recognition · Computer Science 2024-04-05 Kihong Kim , Haneol Lee , Jihye Park , Seyeon Kim , Kwanghee Lee , Seungryong Kim , Jaejun Yoo