Related papers: Generative Rendering: Controllable 4D-Guided Video…

Strong and Controllable 3D Motion Generation

Human motion generation is a significant pursuit in generative computer vision with widespread applications in film-making, video games, AR/VR, and human-robot interaction. Current methods mainly utilize either diffusion-based generative…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Canxuan Gang

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Patrick Esser , Johnathan Chiu , Parmida Atighehchian , Jonathan Granskog , Anastasis Germanidis

Control3D: Towards Controllable Text-to-3D Generation

Recent remarkable advances in large-scale text-to-image diffusion models have inspired a significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a given text prompt. However, existing text-to-3D…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Yang Chen , Yingwei Pan , Yehao Li , Ting Yao , Tao Mei

V3D: Video Diffusion Models are Effective 3D Generators

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Zilong Chen , Yikai Wang , Feng Wang , Zhengyi Wang , Huaping Liu

Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Recent advances in diffusion models bring new vitality to visual content creation. However, current text-to-video generation models still face significant challenges such as high training costs, substantial data requirements, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Sicong Feng , Jielong Yang , Li Peng

MeshDiffusion: Score-based Generative 3D Mesh Modeling

We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are…

Graphics · Computer Science 2023-04-18 Zhen Liu , Yao Feng , Michael J. Black , Derek Nowrouzezahrai , Liam Paull , Weiyang Liu

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Text-guided diffusion models have shown superior performance in image/video generation and editing. While few explorations have been performed in 3D scenarios. In this paper, we discuss three fundamental and interesting problems on this…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Gang Li , Heliang Zheng , Chaoyue Wang , Chang Li , Changwen Zheng , Dacheng Tao

ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

Generating animated 3D objects is at the heart of many applications, yet most advanced works are typically difficult to apply in practice because of their limited setup, their long runtime, or their limited quality. We introduce ActionMesh,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Remy Sabathier , David Novotny , Niloy J. Mitra , Tom Monnier

Lighting-grounded Video Generation with Renderer-based Agent Reasoning

Diffusion models have achieved remarkable progress in video generation, but their controllability remains a major limitation. Key scene factors such as layout, lighting, and camera trajectory are often entangled or only weakly modeled,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Ziqi Cai , Taoyu Yang , Zheng Chang , Si Li , Han Jiang , Shuchen Weng , Boxin Shi

MEVG: Multi-event Video Generation with Text-to-Video Models

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Gyeongrok Oh , Jaehwan Jeong , Sieun Kim , Wonmin Byeon , Jinkyu Kim , Sungwoong Kim , Sangpil Kim

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Sherwin Bahmani , Ivan Skorokhodov , Aliaksandr Siarohin , Willi Menapace , Guocheng Qian , Michael Vasilkovsky , Hsin-Ying Lee , Chaoyang Wang , Jiaxu Zou , Andrea Tagliasacchi , David B. Lindell , Sergey Tulyakov

CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation

Recently, 3D generation methods have shown their powerful ability to automate 3D model creation. However, most 3D generation methods only rely on an input image or a text prompt to generate a 3D model, which lacks the control of each…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Peng Li , Suizhi Ma , Jialiang Chen , Yuan Liu , Congyi Zhang , Wei Xue , Wenhan Luo , Alla Sheffer , Wenping Wang , Yike Guo

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Jieying Chen , Jeffrey Hu , Joan Lasenby , Ayush Tewari

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Heng Yu , Chaoyang Wang , Peiye Zhuang , Willi Menapace , Aliaksandr Siarohin , Junli Cao , Laszlo A Jeni , Sergey Tulyakov , Hsin-Ying Lee

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts. Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Dave Zhenyu Chen , Yawar Siddiqui , Hsin-Ying Lee , Sergey Tulyakov , Matthias Nießner

Controllable Audio-Visual Viewpoint Generation from 360{\deg} Spatial Information

The generation of sounding videos has seen significant advancements with the advent of diffusion models. However, existing methods often lack the fine-grained control needed to generate viewpoint-specific content from larger, immersive…

Multimedia · Computer Science 2025-10-08 Christian Marinoni , Riccardo Fosco Gramaccioni , Eleonora Grassucci , Danilo Comminiello

Curved Diffusion: A Generative Model With Optical Geometry Control

State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Andrey Voynov , Amir Hertz , Moab Arar , Shlomi Fruchter , Daniel Cohen-Or

GD-VDM: Generated Depth for better Diffusion-based Video Generation

The field of generative models has recently witnessed significant progress, with diffusion models showing remarkable performance in image generation. In light of this success, there is a growing interest in exploring the application of…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Ariel Lapid , Idan Achituve , Lior Bracha , Ethan Fetaya

VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control

With the advancement of generative artificial intelligence, previous studies have achieved the task of generating aesthetic images from hand-drawn sketches, fulfilling the public's needs for drawing. However, these methods are limited to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Lifan Jiang , Shuang Chen , Boxi Wu , Xiaotong Guan , Jiahui Zhang