Related papers: Bootstrap3D: Improving Multi-view Diffusion Model …

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Yiwen Chen , Chi Zhang , Xiaofeng Yang , Zhongang Cai , Gang Yu , Lei Yang , Guosheng Lin

Image Captioning with Multi-Context Synthetic Data

Image captioning requires numerous annotated image-text pairs, resulting in substantial annotation costs. Recently, large models (e.g. diffusion models and large language models) have excelled in producing high-quality images and text. This…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Feipeng Ma , Yizhou Zhou , Fengyun Rao , Yueyi Zhang , Xiaoyan Sun

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency.…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Qi Zuo , Xiaodong Gu , Lingteng Qiu , Yuan Dong , Zhengyi Zhao , Weihao Yuan , Rui Peng , Siyu Zhu , Zilong Dong , Liefeng Bo , Qixing Huang

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

With the increasing popularity of autonomous driving based on the powerful and unified bird's-eye-view (BEV) representation, a demand for high-quality and large-scale multi-view video data with accurate annotation is urgently required.…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Xiaofan Li , Yifu Zhang , Xiaoqing Ye

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Rishab Parthasarathy , Zachary Ankner , Aaron Gokaslan

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Bing Li , Cheng Zheng , Wenxuan Zhu , Jinjie Mai , Biao Zhang , Peter Wonka , Bernard Ghanem

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Zeyu Yang , Zijie Pan , Chun Gu , Li Zhang

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Lukas Höllein , Aljaž Božič , Norman Müller , David Novotny , Hung-Yu Tseng , Christian Richardt , Michael Zollhöfer , Matthias Nießner

Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation

Generating high-quality 3D content from text, single images, or sparse view images remains a challenging task with broad applications. Existing methods typically employ multi-view diffusion models to synthesize multi-view images, followed…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Junlin Han , Jianyuan Wang , Andrea Vedaldi , Philip Torr , Filippos Kokkinos

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are…

Computer Vision and Pattern Recognition · Computer Science 2023-11-27 Jiahao Li , Hao Tan , Kai Zhang , Zexiang Xu , Fujun Luan , Yinghao Xu , Yicong Hong , Kalyan Sunkavalli , Greg Shakhnarovich , Sai Bi

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

Improving face generation quality and prompt following with synthetic captions

Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere…

Computer Vision and Pattern Recognition · Computer Science 2024-05-20 Michail Tarasiou , Stylianos Moschoglou , Jiankang Deng , Stefanos Zafeiriou

DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation

Recent advancements in generative models have provided promising solutions for synthesizing realistic driving videos, which are crucial for training autonomous driving perception models. However, existing approaches often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Wei Wu , Xi Guo , Weixuan Tang , Tingxuan Huang , Chiyu Wang , Dongyue Chen , Chenjing Ding

MVDream: Multi-view Diffusion for 3D Generation

We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2024-04-19 Yichun Shi , Peng Wang , Jianglong Ye , Mai Long , Kejie Li , Xiao Yang

TurboPortrait3D: Single-step diffusion-based fast portrait novel-view synthesis

We introduce TurboPortrait3D: a method for low-latency novel-view synthesis of human portraits. Our approach builds on the observation that existing image-to-3D models for portrait generation, while capable of producing renderable 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Emily Kim , Julieta Martinez , Timur Bagautdinov , Jessica Hodgins

BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training

With the rapid adoption of diffusion models, synthetic data generation has emerged as a promising approach for addressing the growing demand for large-scale image datasets. However, images generated purely by diffusion models often exhibit…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Thejas Venkatesh , Suguna Varshini Velury

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Jingxiang Sun , Bo Zhang , Ruizhi Shao , Lizhen Wang , Wen Liu , Zhenda Xie , Yebin Liu

Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-29 Seungwook Kim , Yichun Shi , Kejie Li , Minsu Cho , Peng Wang

MV-RAG: Retrieval Augmented Multiview Diffusion

Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding…

Computer Vision and Pattern Recognition · Computer Science 2025-08-25 Yosef Dayani , Omer Benishu , Sagie Benaim

Envision3D: One Image to 3D with Anchor Views Interpolation

We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Yatian Pang , Tanghui Jia , Yujun Shi , Zhenyu Tang , Junwu Zhang , Xinhua Cheng , Xing Zhou , Francis E. H. Tay , Li Yuan