English
Related papers

Related papers: LatentMan: Generating Consistent Animated Characte…

200 papers

Large-scale text-to-video (T2V) diffusion models have great progress in recent years in terms of visual quality, motion and temporal consistency. However, the generation process is still a black box, where all attributes (e.g., appearance,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-08 Jiwen Yu , Xiaodong Cun , Chenyang Qi , Yong Zhang , Xintao Wang , Ying Shan , Jian Zhang

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. In this paper, we introduce a new task of zero-shot text-to-video generation and propose a low-cost approach (without…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Levon Khachatryan , Andranik Movsisyan , Vahram Tadevosyan , Roberto Henschel , Zhangyang Wang , Shant Navasardyan , Humphrey Shi

Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yuxiang Bao , Di Qiu , Guoliang Kang , Baochang Zhang , Bo Jin , Kaiye Wang , Pengfei Yan

Image-to-video (I2V) generation seeks to produce realistic motion sequences from a single reference image. Although recent methods exhibit strong temporal consistency, they often struggle when dealing with complex, non-repetitive human…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Ashkan Taghipour , Morteza Ghahremani , Mohammed Bennamoun , Farid Boussaid , Aref Miri Rekavandi , Zinuo Li , Qiuhong Ke , Hamid Laga

In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Susung Hong , Junyoung Seo , Heeseong Shin , Sunghwan Hong , Seungryong Kim

We propose Latent-Shift -- an efficient text-to-video generation method based on a pretrained text-to-image generation model that consists of an autoencoder and a U-Net diffusion model. Learning a video diffusion model in the latent space…

Computer Vision and Pattern Recognition · Computer Science 2023-04-19 Jie An , Songyang Zhang , Harry Yang , Sonal Gupta , Jia-Bin Huang , Jiebo Luo , Xi Yin

Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Li Hu , Xin Gao , Peng Zhang , Ke Sun , Bang Zhang , Liefeng Bo

We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Mingi Kwon , Seoung Wug Oh , Yang Zhou , Difan Liu , Joon-Young Lee , Haoran Cai , Baqiao Liu , Feng Liu , Youngjung Uh

The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Bosheng Qin , Wentao Ye , Qifan Yu , Siliang Tang , Yueting Zhuang

Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Roberto Henschel , Levon Khachatryan , Hayk Poghosyan , Daniil Hayrapetyan , Vahram Tadevosyan , Zhangyang Wang , Shant Navasardyan , Humphrey Shi

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Andreas Blattmann , Robin Rombach , Huan Ling , Tim Dockhorn , Seung Wook Kim , Sanja Fidler , Karsten Kreis

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Jay Zhangjie Wu , Yixiao Ge , Xintao Wang , Weixian Lei , Yuchao Gu , Yufei Shi , Wynne Hsu , Ying Shan , Xiaohu Qie , Mike Zheng Shou

Based on recent advanced diffusion models, Text-to-image (T2I) generation models have demonstrated their capabilities to generate diverse and high-quality images. However, leveraging their potential for real-world content creation,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Sandra Zhang Ding , Jiafeng Mao , Kiyoharu Aizawa

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Michal Geyer , Omer Bar-Tal , Shai Bagon , Tali Dekel

While Text-To-Video (T2V) models have advanced rapidly, they continue to struggle with generating legible and coherent text within videos. In particular, existing models often fail to render correctly even short phrases or words and…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Ziyang Liu , Kevin Valencia , Justin Cui

Although powerful for image generation, consistent and controllable video is a longstanding problem for diffusion models. Video models require extensive training and computational resources, leading to high costs and large environmental…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Muhammad Haaris Khan , Hadrien Reynaud , Bernhard Kainz

Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions that correspond to a single sentence describing a single action. However, when a text stream describes a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Zhao Yang , Bing Su , Ji-Rong Wen

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yingqing He , Tianyu Yang , Yong Zhang , Ying Shan , Qifeng Chen

We present T2Bs, a framework for generating high-quality, animatable character head morphable models from text by combining static text-to-3D generation with video diffusion. Text-to-3D models produce detailed static geometry but lack…

Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Haomiao Ni , Bernhard Egger , Suhas Lohit , Anoop Cherian , Ye Wang , Toshiaki Koike-Akino , Sharon X. Huang , Tim K. Marks
‹ Prev 1 2 3 10 Next ›