English
Related papers

Related papers: Motion-Zero: Zero-Shot Moving Object Control Frame…

200 papers

Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos. Without motion information from videos, motion priors implied in prompts are vital guidance. For example, the prompt "airplane landing on the runway"…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Sitong Su , Litao Guo , Lianli Gao , Hengtao Shen , Jingkuan Song

Generating videos with realistic and physically plausible motion is one of the main recent challenges in computer vision. While diffusion models are achieving compelling results in image generation, video diffusion models are limited by…

Machine Learning · Computer Science 2024-10-28 Luca Savant Aira , Antonio Montanaro , Emanuele Aiello , Diego Valsesia , Enrico Magli

Large-scale text-to-video (T2V) diffusion models have great progress in recent years in terms of visual quality, motion and temporal consistency. However, the generation process is still a black box, where all attributes (e.g., appearance,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Jiwen Yu , Xiaodong Cun , Chenyang Qi , Yong Zhang , Xintao Wang , Ying Shan , Jian Zhang

We present a new method for text-driven motion transfer - synthesizing a video that complies with an input text prompt describing the target objects and scene while maintaining an input video's motion and scene layout. Prior methods are…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Danah Yatim , Rafail Fridman , Omer Bar-Tal , Yoni Kasten , Tali Dekel

Text-to-video models have demonstrated impressive capabilities in producing diverse and captivating video content, showcasing a notable advancement in generative AI. However, these models generally lack fine-grained control over motion…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Tuna Han Salih Meral , Hidir Yesiltepe , Connor Dunlop , Pinar Yanardag

Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable…

Computer Vision and Pattern Recognition · Computer Science 2023-09-19 Shuai Yang , Yifan Zhou , Ziwei Liu , Chen Change Loy

We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Hongwei Yi , Justus Thies , Michael J. Black , Xue Bin Peng , Davis Rempe

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. In this paper, we introduce a new task of zero-shot text-to-video generation and propose a low-cost approach (without…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Levon Khachatryan , Andranik Movsisyan , Vahram Tadevosyan , Roberto Henschel , Zhangyang Wang , Shant Navasardyan , Humphrey Shi

While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos…

Computer Vision and Pattern Recognition · Computer Science 2024-07-31 Jingyun Liang , Yuchen Fan , Kai Zhang , Radu Timofte , Luc Van Gool , Rakesh Ranjan

Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Zhihao Hu , Dong Xu

We present DiffIR2VR-Zero, a zero-shot framework that enables any pre-trained image restoration diffusion model to perform high-quality video restoration without additional training. While image diffusion models have shown remarkable…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Chang-Han Yeh , Hau-Shiang Shiu , Chin-Yang Lin , Zhixiang Wang , Chi-Wei Hsiao , Ting-Hsuan Chen , Yu-Lun Liu

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-01 Daniel Geng , Andrew Owens

Existing video deraining methods are often trained on paired datasets, either synthetic, which limits their ability to generalize to real-world rain, or captured by static cameras, which restricts their effectiveness in dynamic scenes with…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Tuomas Varanka , Juan Luis Gonzalez , Hyeongwoo Kim , Pablo Garrido , Xu Yao

In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos. Specifically, given target identity and posture sequences, DreaMoving can generate a video of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Mengyang Feng , Jinlin Liu , Kai Yu , Yuan Yao , Zheng Hui , Xiefan Guo , Xianhui Lin , Haolan Xue , Chen Shi , Xiaowen Li , Aojie Li , Xiaoyang Kang , Biwen Lei , Miaomiao Cui , Peiran Ren , Xuansong Xie

Controllable video generation has attracted significant attention, largely due to advances in video diffusion models. In domains such as autonomous driving, it is essential to develop highly accurate predictions for object motions. This…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Ge Ya Luo , Zhi Hao Luo , Anthony Gosselin , Alexia Jolicoeur-Martineau , Christopher Pal

Specifying nuanced and compelling camera motion remains a significant hurdle for non-expert creators using generative tools, creating an "expressive gap" where generic text prompts fail to capture cinematic vision. This barrier limits…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Pooja Guhan , Divya Kothandaraman , Geonsun Lee , Tsung-Wei Huang , Guan-Ming Su , Dinesh Manocha

In recent years, large-scale pre-trained diffusion transformer models have made significant progress in video generation. While current DiT models can produce high-definition, high-frame-rate, and highly diverse videos, there is a lack of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Changgu Chen , Xiaoyan Yang , Junwei Shu , Changbo Wang , Yang Li

In Omnimatte, one aims to decompose a given video into semantically meaningful layers, including the background and individual objects along with their associated effects, such as shadows and reflections. Existing methods often require…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Dvir Samuel , Matan Levy , Nir Darshan , Gal Chechik , Rami Ben-Ari

Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Haonan Qiu , Zhaoxi Chen , Zhouxia Wang , Yingqing He , Menghan Xia , Ziwei Liu

Recent advances in diffusion-based text-to-video (T2V) models have demonstrated remarkable progress, but these models still face challenges in generating videos with multiple objects. Most models struggle with accurately capturing complex…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Aimon Rahman , Jiang Liu , Ze Wang , Ximeng Sun , Jialian Wu , Xiaodong Yu , Yusheng Su , Vishal M. Patel , Zicheng Liu , Emad Barsoum
‹ Prev 1 2 3 10 Next ›