English
Related papers

Related papers: Compositional Video Generation as Flow Equalizatio…

200 papers

Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Ye Tian , Ling Yang , Haotian Yang , Yuan Gao , Yufan Deng , Jingmin Chen , Xintao Wang , Zhaochen Yu , Xin Tao , Pengfei Wan , Di Zhang , Bin Cui

Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest. This task learns a novel concept (e.g., a unique toy), illustrated in a handful of images, into a generative model that…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Shaozhe Hao , Kai Han , Shihao Zhao , Kwan-Yee K. Wong

Text-to-video (T2V) generative models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this…

Computer Vision and Pattern Recognition · Computer Science 2025-01-16 Kaiyue Sun , Kaiyi Huang , Xian Liu , Yue Wu , Zihan Xu , Zhenguo Li , Xihui Liu

Text-driven Image to Video Generation (TI2V) aims to generate controllable video given the first frame and corresponding textual description. The primary challenges of this task lie in two parts: (i) how to identify the target objects and…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Xingrui Wang , Xin Li , Yaosi Hu , Hanxin Zhu , Chen Hou , Cuiling Lan , Zhibo Chen

Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Weixi Feng , Jiachen Li , Michael Saxon , Tsu-jui Fu , Wenhu Chen , William Yang Wang

Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Hyeonho Jeong , Geon Yeong Park , Jong Chul Ye

Generating consistent long videos is a complex challenge: while diffusion-based generative models generate visually impressive short clips, extending them to longer durations often leads to memory bottlenecks and long-term inconsistency. In…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Wenqi Ouyang , Zeqi Xiao , Danni Yang , Yifan Zhou , Shuai Yang , Lei Yang , Jianlou Si , Xingang Pan

Diffusion-based text-to-video generation (T2V) or image-to-video (I2V) generation have emerged as a prominent research focus. However, there exists a challenge in integrating the two generative paradigms into a unified model. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Xinyu Xiao , Binbin Yang , Tingtian Li , Yipeng Yu , Sen Lei

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Michal Geyer , Omer Bar-Tal , Shai Bagon , Tali Dekel

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Wei Wang , Yaosen Chen , Yuegen Liu , Qi Yuan , Shubin Yang , Yanru Zhang

Text-to-video (T2V) generation technology holds potential to transform multiple domains such as education, marketing, entertainment, and assistive technologies for individuals with visual or reading comprehension challenges, by creating…

Graphics · Computer Science 2025-10-07 Nilay Kumar , Priyansh Bhandari , G. Maragatham

Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Roberto Henschel , Levon Khachatryan , Hayk Poghosyan , Daniil Hayrapetyan , Vahram Tadevosyan , Zhangyang Wang , Shant Navasardyan , Humphrey Shi

Diffusion-based \textit{image-to-video} (I2V) generation has become a central direction in generative models by turning a reference image, with optional conditions, into a temporally coherent video. Compared with broader video generation…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xianlong Wang , Wenbo Pan , Shijia Zhou , Ke Li , Yuqi Wang , Zeyu Ye , Hangtao Zhang , Leo Yu Zhang , Xiaohua Jia

Text-to-video diffusion models generate realistic videos, but often fail on prompts requiring fine-grained compositional understanding, such as relations between entities, attributes, actions, and motion directions. We hypothesize that…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Ariel Shaulov , Eitan Shaar , Amit Edenzon , Gal Chechik , Lior Wolf

This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task to simultaneously a) accomplish the synthesis of…

Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Haoxin Chen , Menghan Xia , Yingqing He , Yong Zhang , Xiaodong Cun , Shaoshu Yang , Jinbo Xing , Yaofang Liu , Qifeng Chen , Xintao Wang , Chao Weng , Ying Shan

State-of-the-art Text-to-Video (T2V) diffusion models can generate visually impressive results, yet they still frequently fail to compose complex scenes or follow logical temporal instructions. In this paper, we argue that many errors,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Mariam Hassan , Bastien Van Delft , Wuyang Li , Alexandre Alahi

Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these…

Computer Vision and Pattern Recognition · Computer Science 2024-01-18 Haoxin Chen , Yong Zhang , Xiaodong Cun , Menghan Xia , Xintao Wang , Chao Weng , Ying Shan

Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce…

Human-Computer Interaction · Computer Science 2023-09-29 Vivian Liu , Tao Long , Nathan Raw , Lydia Chilton
‹ Prev 1 2 3 10 Next ›