Related papers: Clockwork Variational Autoencoders

Predicting Video with VQVAE

In recent years, the task of video prediction-forecasting future video given past video frames-has attracted attention in the research community. In this paper we propose a novel approach to this problem with Vector Quantized Variational…

Computer Vision and Pattern Recognition · Computer Science 2021-03-03 Jacob Walker , Ali Razavi , Aäron van den Oord

Video Generation with Predictive Latents

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yian Zhao , Feng Wang , Qiushan Guo , Chang Liu , Xiangyang Ji , Jian Zhang , Jie Chen

Deep Hierarchical Video Compression

Recently, probabilistic predictive coding that directly models the conditional distribution of latent features across successive frames for temporal redundancy removal has yielded promising results. Existing methods using a single-scale…

Image and Video Processing · Electrical Eng. & Systems 2023-12-13 Ming Lu , Zhihao Duan , Fengqing Zhu , Zhan Ma

Clockwork Convnets for Video Semantic Segmentation

Recent years have seen tremendous progress in still-image segmentation; however the na\"ive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent…

Computer Vision and Pattern Recognition · Computer Science 2016-08-15 Evan Shelhamer , Kate Rakelly , Judy Hoffman , Trevor Darrell

Improved Conditional VRNNs for Video Prediction

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Lluis Castrejon , Nicolas Ballas , Aaron Courville

Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

A video prediction model that generalizes to diverse scenes would enable intelligent agents such as robots to perform a variety of tasks via planning with the model. However, while existing video prediction models have produced promising…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Bohan Wu , Suraj Nair , Roberto Martin-Martin , Li Fei-Fei , Chelsea Finn

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we…

Machine Learning · Computer Science 2019-11-12 Adam Roberts , Jesse Engel , Colin Raffel , Curtis Hawthorne , Douglas Eck

Benchmarking Generative Latent Variable Models for Speech

Stochastic latent variable models (LVMs) achieve state-of-the-art performance on natural image generation but are still inferior to deterministic models on speech. In this paper, we develop a speech benchmark of popular temporal LVMs and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-06 Jakob D. Havtorn , Lasse Borgholt , Søren Hauberg , Jes Frellsen , Lars Maaløe

Correlated Variational Auto-Encoders

Variational Auto-Encoders (VAEs) are capable of learning latent representations for high dimensional data. However, due to the i.i.d. assumption, VAEs only optimize the singleton variational distributions and fail to account for the…

Machine Learning · Computer Science 2020-04-20 Da Tang , Dawen Liang , Tony Jebara , Nicholas Ruozzi

Large Motion Video Autoencoding with Cross-modal Video VAE

Learning a robust video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation. Directly applying image VAEs to individual frames in isolation can result in temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Yazhou Xing , Yang Fei , Yingqing He , Jingye Chen , Jiaxin Xie , Xiaowei Chi , Qifeng Chen

Conditional Temporal Variational AutoEncoder for Action Video Prediction

To synthesize a realistic action sequence based on a single human image, it is crucial to model both motion patterns and diversity in the action video. This paper proposes an Action Conditional Temporal Variational AutoEncoder (ACT-VAE) to…

Computer Vision and Pattern Recognition · Computer Science 2021-08-13 Xiaogang Xu , Yi Wang , Liwei Wang , Bei Yu , Jiaya Jia

Hybrid Variational Autoencoder for Time Series Forecasting

Variational autoencoders (VAE) are powerful generative models that learn the latent representations of input data as random variables. Recent studies show that VAE can flexibly learn the complex temporal dynamics of time series and achieve…

Machine Learning · Computer Science 2023-11-14 Borui Cai , Shuiqiao Yang , Longxiang Gao , Yong Xiang

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however…

Computer Vision and Pattern Recognition · Computer Science 2021-10-08 Moitreya Chatterjee , Narendra Ahuja , Anoop Cherian

Variational Inference Aided Estimation of Time Varying Channels

One way to improve the estimation of time varying channels is to incorporate knowledge of previous observations. In this context, Dynamical VAEs (DVAEs) build a promising deep learning (DL) framework which is well suited to learn the…

Signal Processing · Electrical Eng. & Systems 2022-11-04 Benedikt Böck , Michael Baur , Valentina Rizzello , Wolfgang Utschick

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Zongjian Li , Bin Lin , Yang Ye , Liuhan Chen , Xinhua Cheng , Shenghai Yuan , Li Yuan

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

Predicting future frames of video sequences is challenging due to the complex and stochastic nature of the problem. Video prediction methods based on variational auto-encoders (VAEs) have been a great success, but they require the training…

Computer Vision and Pattern Recognition · Computer Science 2021-01-29 Yizhou Zhou , Chong Luo , Xiaoyan Sun , Zheng-Jun Zha , Wenjun Zeng

Predictive variational autoencoder for learning robust representations of time-series data

Variational autoencoders (VAEs) have been used extensively to discover low-dimensional latent factors governing neural activity and animal behavior. However, without careful model selection, the uncovered latent factors may reflect noise in…

Machine Learning · Computer Science 2023-12-13 Julia Huiming Wang , Dexter Tsin , Tatiana Engel

Theory and Experiments on Vector Quantized Autoencoders

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however,…

Machine Learning · Computer Science 2018-07-23 Aurko Roy , Ashish Vaswani , Arvind Neelakantan , Niki Parmar

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Sijie Zhao , Yong Zhang , Xiaodong Cun , Shaoshu Yang , Muyao Niu , Xiaoyu Li , Wenbo Hu , Ying Shan

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Learning to predict the long-term future of video frames is notoriously challenging due to inherent ambiguities in the distant future and dramatic amplifications of prediction error through time. Despite the recent advances in the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-15 Wonkwang Lee , Whie Jung , Han Zhang , Ting Chen , Jing Yu Koh , Thomas Huang , Hyungsuk Yoon , Honglak Lee , Seunghoon Hong