English
Related papers

Related papers: SneakPeek: Future-Guided Instructional Streaming V…

200 papers

While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Hyelin Nam , Jaemin Kim , Dohun Lee , Jong Chul Ye

Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligence Generated Content (AIGC). Compared to static image generation, video generation poses unique…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Zhiyu Yin , Kehai Chen , Xuefeng Bai , Ruili Jiang , Juntao Li , Hongdong Li , Jin Liu , Yang Xiang , Jun Yu , Min Zhang

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Xinyuan Chen , Yaohui Wang , Lingjun Zhang , Shaobin Zhuang , Xin Ma , Jiashuo Yu , Yali Wang , Dahua Lin , Yu Qiao , Ziwei Liu

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

Large-scale video generative models have recently demonstrated strong visual capabilities, enabling the prediction of future frames that adhere to the logical and physical cues in the current observation. In this work, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Gongfan Fang , Xinyin Ma , Xinchao Wang

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal…

Recent advances in diffusion models bring new vitality to visual content creation. However, current text-to-video generation models still face significant challenges such as high training costs, substantial data requirements, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Sicong Feng , Jielong Yang , Li Peng

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Jinbo Xing , Menghan Xia , Yuxin Liu , Yuechen Zhang , Yong Zhang , Yingqing He , Hanyuan Liu , Haoxin Chen , Xiaodong Cun , Xintao Wang , Ying Shan , Tien-Tsin Wong

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Patrick Esser , Johnathan Chiu , Parmida Atighehchian , Jonathan Granskog , Anastasis Germanidis

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Bosheng Qin , Juncheng Li , Siliang Tang , Tat-Seng Chua , Yueting Zhuang

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Gyeongrok Oh , Jaehwan Jeong , Sieun Kim , Wonmin Byeon , Jinkyu Kim , Sungwoong Kim , Sangpil Kim

Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream, enabling sub-second latency with up to 29 FPS…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Joonghyuk Shin , Zhengqi Li , Richard Zhang , Jun-Yan Zhu , Jaesik Park , Eli Shechtman , Xun Huang

We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Mohamed Ashraf Abdelsalam , Samrudhdhi B. Rangrej , Isma Hadji , Nikita Dvornik , Konstantinos G. Derpanis , Afsaneh Fazly

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation. Different from traditional generative approaches that operate based on user-provided images or text, our framework is designed for dynamic interaction,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Yiyuan Zhang , Yuhao Kang , Zhixin Zhang , Xiaohan Ding , Sanyuan Zhao , Xiangyu Yue

Generated video scenes for action-centric sequence descriptions, such as recipe instructions and do-it-yourself projects, often include non-linear patterns, where the next video may need to be visually consistent not with the immediately…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Vasco Ramos , Yonatan Bitton , Michal Yarom , Idan Szpektor , Joao Magalhaes

This paper addresses the challenge of text-conditioned streaming motion generation, which requires us to predict the next-step human pose based on variable-length historical motions and incoming texts. Existing methods struggle to achieve…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Lixing Xiao , Shunlin Lu , Huaijin Pi , Ke Fan , Liang Pan , Yueer Zhou , Ziyong Feng , Xiaowei Zhou , Sida Peng , Jingbo Wang

Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Zixun Fang , Kai Zhu , Zhiheng Liu , Yu Liu , Wei Zhai , Yang Cao , Zheng-Jun Zha

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time…

Computer Vision and Pattern Recognition · Computer Science 2022-06-10 Tim Brooks , Janne Hellsten , Miika Aittala , Ting-Chun Wang , Timo Aila , Jaakko Lehtinen , Ming-Yu Liu , Alexei A. Efros , Tero Karras

Large pretrained diffusion models have significantly enhanced the quality of generated videos, and yet their use in real-time streaming remains limited. Autoregressive models offer a natural framework for sequential frame synthesis but…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Jinxiu Liu , Xuanming Liu , Kangfu Mei , Yandong Wen , Ming-Hsuan Yang , Weiyang Liu

With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation foundation models has led to growing demand for…

‹ Prev 1 2 3 10 Next ›