Related papers: Stable Virtual Camera: Generative View Synthesis w…

Consistent View Synthesis with Pose-Guided Diffusion Models

Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Hung-Yu Tseng , Qinbo Li , Changil Kim , Suhib Alsisan , Jia-Bin Huang , Johannes Kopf

Enhanced Creativity and Ideation through Stable Video Synthesis

This paper explores the innovative application of Stable Video Diffusion (SVD), a diffusion model that revolutionizes the creation of dynamic video content from static images. As digital media and design industries accelerate, SVD emerges…

Human-Computer Interaction · Computer Science 2024-05-24 Elijah Miller , Thomas Dupont , Mingming Wang

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Jeong-gi Kwak , Erqun Dong , Yuhe Jin , Hanseok Ko , Shweta Mahajan , Kwang Moo Yi

Stable View Synthesis

We present Stable View Synthesis (SVS). Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene. The method operates on a geometric scaffold computed via…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Gernot Riegler , Vladlen Koltun

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Vikram Voleti , Chun-Han Yao , Mark Boss , Adam Letts , David Pankratz , Dmitry Tochilkin , Christian Laforte , Robin Rombach , Varun Jampani

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image

Generating high-quality novel views of a scene from a single image requires maintaining structural coherence across different views, referred to as view consistency. While diffusion models have driven advancements in novel view synthesis,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jiwoo Park , Tae Eun Choi , Youngjun Jun , Seong Jae Hwang

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Yiming Xie , Chun-Han Yao , Vikram Voleti , Huaizu Jiang , Varun Jampani

Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion

Novel view synthesis (NVS) from a single image is highly ill-posed due to large unobserved regions, especially for views that deviate significantly from the input. While existing methods focus on consistency between the source and generated…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Xueyang Kang , Zhengkang Xiang , Zezheng Zhang , Kourosh Khoshelham

Novel View Synthesis with Pixel-Space Diffusion Models

Synthesizing a novel view from a single input image is a challenging task. Traditionally, this task was approached by estimating scene depth, warping, and inpainting, with machine learning models enabling parts of the pipeline. More…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Noam Elata , Bahjat Kawar , Yaron Ostrovsky-Berman , Miriam Farber , Ron Sokolovsky

Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models

Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Jason J. Yu , Fereshteh Forghani , Konstantinos G. Derpanis , Marcus A. Brubaker

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Jieying Chen , Jeffrey Hu , Joan Lasenby , Ayush Tewari

SimVS: Simulating World Inconsistencies for Robust View Synthesis

Novel-view synthesis techniques achieve impressive results for static scenes but struggle when faced with the inconsistencies inherent to casual capture settings: varying illumination, scene motion, and other unintended effects that are…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Alex Trevithick , Roni Paiss , Philipp Henzler , Dor Verbin , Rundi Wu , Hadi Alzayer , Ruiqi Gao , Ben Poole , Jonathan T. Barron , Aleksander Holynski , Ravi Ramamoorthi , Pratul P. Srinivasan

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Wenhao Chai , Xun Guo , Gaoang Wang , Yan Lu

Generative View Stitching

Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Chonghyuk Song , Michal Stary , Boyuan Chen , George Kopanas , Vincent Sitzmann

GRVS: a Generalizable and Recurrent Approach to Monocular Dynamic View Synthesis

Synthesizing novel views from monocular videos of dynamic scenes remains a challenging problem. Scene-specific methods that optimize 4D representations with explicit motion priors often break down in highly dynamic regions where multi-view…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Thomas Tanay , Mohammed Brahimi , Michal Nazarczuk , Qingwen Zhang , Sibi Catley-Chandar , Arthur Moreau , Zhensong Zhang , Eduardo Pérez-Pellitero

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

Collecting multi-view driving scenario videos to enhance the performance of 3D visual perception tasks presents significant challenges and incurs substantial costs, making generative models for realistic data an appealing alternative. Yet,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-29 Junpeng Jiang , Gangyi Hong , Miao Zhang , Hengtong Hu , Kun Zhan , Rui Shao , Liqiang Nie

Generative Novel View Synthesis with 3D-Aware Diffusion Models

We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of…

Computer Vision and Pattern Recognition · Computer Science 2023-04-06 Eric R. Chan , Koki Nagano , Matthew A. Chan , Alexander W. Bergman , Jeong Joon Park , Axel Levy , Miika Aittala , Shalini De Mello , Tero Karras , Gordon Wetzstein

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion (SVD) model…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Yuci Han , Charles Toth , John E. Anderson , William J. Shuart , Alper Yilmaz

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Andreas Blattmann , Tim Dockhorn , Sumith Kulal , Daniel Mendelevitch , Maciej Kilian , Dominik Lorenz , Yam Levi , Zion English , Vikram Voleti , Adam Letts , Varun Jampani , Robin Rombach

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-video, amongst others. Therefore, these fragmented…

Computer Vision and Pattern Recognition · Computer Science 2026-01-26 Xiang Fan , Sharath Girish , Vivek Ramanujan , Chaoyang Wang , Ashkan Mirzaei , Petr Sushko , Aliaksandr Siarohin , Sergey Tulyakov , Ranjay Krishna