MaterialPicker: Multi-Modal DiT-Based Material Generation

Xiaohe Ma; Valentin Deschaintre; Miloš Hašan; Fujun Luan; Kun Zhou; Hongzhi Wu; Yiwei Hu

doi:10.1145/3731199

MaterialPicker: Multi-Modal DiT-Based Material Generation

Computer Vision and Pattern Recognition 2025-07-29 v3

Authors: Xiaohe Ma , Valentin Deschaintre , Miloš Hašan , Fujun Luan , Kun Zhou , Hongzhi Wu , Yiwei Hu

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

High-quality material generation is key for virtual environment authoring and inverse rendering. We propose MaterialPicker, a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture, improving and simplifying the creation of high-quality materials from text prompts and/or photographs. Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted, viewed at an angle or partially occluded, as is often the case in photographs of natural scenes. We further allow the user to specify a text prompt to provide additional guidance for the generation. We finetune a pre-trained DiT-based video generator into a material generator, where each material map is treated as a frame in a video sequence. We evaluate our approach both quantitatively and qualitatively and show that it enables more diverse material generation and better distortion correction than previous work.

Keywords

image generation diffusion model video generation

Cite

@article{arxiv.2412.03225,
  title  = {MaterialPicker: Multi-Modal DiT-Based Material Generation},
  author = {Xiaohe Ma and Valentin Deschaintre and Miloš Hašan and Fujun Luan and Kun Zhou and Hongzhi Wu and Yiwei Hu},
  journal= {arXiv preprint arXiv:2412.03225},
  year   = {2025}
}

MaterialPicker: Multi-Modal DiT-Based Material Generation

Abstract

Keywords

Cite

Related papers