Related papers: Probabilistic Video Generation using Holistic Attr…

Video Generation with Predictive Latents

Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yian Zhao , Feng Wang , Qiushan Guo , Chang Liu , Xiangyang Ji , Jian Zhang , Jie Chen

Generating Long Videos of Dynamic Scenes

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time…

Computer Vision and Pattern Recognition · Computer Science 2022-06-10 Tim Brooks , Janne Hellsten , Miika Aittala , Ting-Chun Wang , Timo Aila , Jaakko Lehtinen , Ming-Yu Liu , Alexei A. Efros , Tero Karras

Efficient training for future video generation based on hierarchical disentangled representation of latent variables

Generating videos predicting the future of a given sequence has been an area of active research in recent years. However, an essential problem remains unsolved: most of the methods require large computational cost and memory usage for…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Naoya Fushishita , Antonio Tejero-de-Pablos , Yusuke Mukuta , Tatsuya Harada

Improved Conditional VRNNs for Video Prediction

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Lluis Castrejon , Nicolas Ballas , Aaron Courville

Generating Videos with Scene Dynamics

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative…

Computer Vision and Pattern Recognition · Computer Science 2016-10-27 Carl Vondrick , Hamed Pirsiavash , Antonio Torralba

Simple Video Generation using Neural ODEs

Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely challenging. It is a common belief that a key step towards solving this task resides in modelling…

Computer Vision and Pattern Recognition · Computer Science 2021-09-09 David Kanaa , Vikram Voleti , Samira Ebrahimi Kahou , Christopher Pal

VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is…

Computer Vision and Pattern Recognition · Computer Science 2020-02-13 Manoj Kumar , Mohammad Babaeizadeh , Dumitru Erhan , Chelsea Finn , Sergey Levine , Laurent Dinh , Durk Kingma

Consistent Generative Query Networks

Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-23 Ananya Kumar , S. M. Ali Eslami , Danilo J. Rezende , Marta Garnelo , Fabio Viola , Edward Lockhart , Murray Shanahan

Deep Hierarchical Video Compression

Recently, probabilistic predictive coding that directly models the conditional distribution of latent features across successive frames for temporal redundancy removal has yielded promising results. Existing methods using a single-scale…

Image and Video Processing · Electrical Eng. & Systems 2023-12-13 Ming Lu , Zhihao Duan , Fengqing Zhu , Zhan Ma

Future Frame Prediction Using Convolutional VRNN for Anomaly Detection

Anomaly detection in videos aims at reporting anything that does not conform the normal behaviour or distribution. However, due to the sparsity of abnormal video clips in real life, collecting annotated data for supervised learning is…

Computer Vision and Pattern Recognition · Computer Science 2019-10-22 Yiwei Lu , Mahesh Kumar Krishna Reddy , Seyed shahabeddin Nabavi , Yang Wang

Learning Temporal Regularity in Video Sequences

Perceiving meaningful activities in a long video sequence is a challenging problem due to ambiguous definition of 'meaningfulness' as well as clutters in the scene. We approach this problem by learning a generative model for regular motion…

Computer Vision and Pattern Recognition · Computer Science 2016-04-18 Mahmudul Hasan , Jonghyun Choi , Jan Neumann , Amit K. Roy-Chowdhury , Larry S. Davis

Stochastic Video Generation with a Learned Prior

Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an…

Computer Vision and Pattern Recognition · Computer Science 2024-03-14 Remi Denton , Rob Fergus

Towards Smooth Video Composition

Video generation requires synthesizing consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite,…

Computer Vision and Pattern Recognition · Computer Science 2022-12-15 Qihang Zhang , Ceyuan Yang , Yujun Shen , Yinghao Xu , Bolei Zhou

Xp-GAN: Unsupervised Multi-object Controllable Video Generation

Video Generation is a relatively new and yet popular subject in machine learning due to its vast variety of potential applications and its numerous challenges. Current methods in Video Generation provide the user with little or no control…

Computer Vision and Pattern Recognition · Computer Science 2021-11-22 Bahman Rouhani , Mohammad Rahmati

Video Generation Beyond a Single Clip

We tackle the long video generation problem, i.e.~generating videos beyond the output length of video generation models. Due to the computation resource constraints, video generation models can only generate video clips that are relatively…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Hsin-Ping Huang , Yu-Chuan Su , Ming-Hsuan Yang

The Pose Knows: Video Forecasting by Generating Pose Futures

Current approaches in video forecasting attempt to generate videos directly in pixel space using Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). However, since these approaches try to model all the structure and…

Computer Vision and Pattern Recognition · Computer Science 2017-05-02 Jacob Walker , Kenneth Marino , Abhinav Gupta , Martial Hebert

Video Prediction with Appearance and Motion Conditions

Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future?…

Computer Vision and Pattern Recognition · Computer Science 2018-07-10 Yunseok Jang , Gunhee Kim , Yale Song

Motion Prompting: Controlling Video Generation with Motion Trajectories

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Daniel Geng , Charles Herrmann , Junhwa Hur , Forrester Cole , Serena Zhang , Tobias Pfaff , Tatiana Lopez-Guevara , Carl Doersch , Yusuf Aytar , Michael Rubinstein , Chen Sun , Oliver Wang , Andrew Owens , Deqing Sun

Video Prediction Models as General Visual Encoders

This study explores the potential of open-source video conditional generation models as encoders for downstream tasks, focusing on instance segmentation using the BAIR Robot Pushing Dataset. The researchers propose using video prediction…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 James Maier , Nishanth Mohankumar

A Survey: Spatiotemporal Consistency in Video Generation

Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligence Generated Content (AIGC). Compared to static image generation, video generation poses unique…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Zhiyu Yin , Kehai Chen , Xuefeng Bai , Ruili Jiang , Juntao Li , Hongdong Li , Jin Liu , Yang Xiang , Jun Yu , Min Zhang