Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Zeyi Sun; Tong Wu; Pan Zhang; Yuhang Zang; Xiaoyi Dong; Yuanjun Xiong; Dahua Lin; Jiaqi Wang

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Computer Vision and Pattern Recognition 2024-10-04 v2 Artificial Intelligence Graphics Machine Learning Multimedia

Authors: Zeyi Sun , Tong Wu , Pan Zhang , Yuhang Zang , Xiaoyi Dong , Yuanjun Xiong , Dahua Lin , Jiaqi Wang

Abstract

Recent years have witnessed remarkable progress in multi-view diffusion models for 3D content creation. However, there remains a significant gap in image quality and prompt-following ability compared to 2D diffusion models. A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions. To address this challenge, we propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images to assist in training multi-view diffusion models. Specifically, we introduce a data generation pipeline that employs (1) 2D and video diffusion models to generate multi-view images based on constructed text prompts, and (2) our fine-tuned 3D-aware MV-LLaVA for filtering high-quality data and rewriting inaccurate captions. Leveraging this pipeline, we have generated 1 million high-quality synthetic multi-view images with dense descriptive captions to address the shortage of high-quality 3D data. Furthermore, we present a Training Timestep Reschedule (TTR) strategy that leverages the denoising process to learn multi-view consistency while maintaining the original 2D diffusion prior. Extensive experiments demonstrate that Bootstrap3D can generate high-quality multi-view images with superior aesthetic quality, image-text alignment, and maintained view consistency.

Keywords

novel view synthesis text-to-3d generation video generation

Cite

@article{arxiv.2406.00093,
  title  = {Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data},
  author = {Zeyi Sun and Tong Wu and Pan Zhang and Yuhang Zang and Xiaoyi Dong and Yuanjun Xiong and Dahua Lin and Jiaqi Wang},
  journal= {arXiv preprint arXiv:2406.00093},
  year   = {2024}
}

Comments

Project Page: https://sunzey.github.io/Bootstrap3D/

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

Abstract

Keywords

Cite

Comments

Related papers