Related papers: MaterialPicker: Multi-Modal DiT-Based Material Gen…

Enhancing Image Generation Fidelity via Progressive Prompts

The diffusion transformer (DiT) architecture has attracted significant attention in image generation, achieving better fidelity, performance, and diversity. However, most existing DiT - based image generation methods focus on global - aware…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Zhen Xiong , Yuqi Li , Chuanguang Yang , Tiao Tan , Zhihong Zhu , Siyuan Li , Yue Ma

MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

Existing 2D methods utilize UNet-based diffusion models to generate multi-view physically-based rendering (PBR) maps but struggle with multi-view inconsistency, while some 3D methods directly generate UV maps, encountering generalization…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Shenhao Zhu , Lingteng Qiu , Xiaodong Gu , Zhengyi Zhao , Chao Xu , Yuxiao He , Zhe Li , Xiaoguang Han , Yao Yao , Xun Cao , Siyu Zhu , Weihao Yuan , Zilong Dong , Hao Zhu

VideoMatGen: PBR Materials through Joint Generative Modeling

We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Jon Hasselgren , Zheng Zeng , Milos Hasan , Jacob Munkberg

DiTPainter: Efficient Video Inpainting with Diffusion Transformers

Many existing video inpainting algorithms utilize optical flows to construct the corresponding maps and then propagate pixels from adjacent frames to missing areas by mapping. Despite the effectiveness of the propagation mechanism, they…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Xian Wu , Chang Liu

MotionGrounder: Grounded Multi-Object Motion Transfer via Diffusion Transformer

Motion transfer enables controllable video generation by transferring temporal dynamics from a reference video to synthesize a new video conditioned on a target caption. However, existing Diffusion Transformer (DiT)-based methods are…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Samuel Teodoro , Yun Chen , Agus Gunawan , Soo Ye Kim , Jihyong Oh , Munchurl Kim

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

Diffusion Transformer has shown remarkable abilities in generating high-fidelity videos, delivering visually coherent frames and rich details over extended durations. However, existing video generation models still fall short in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Zhaoyang Li , Dongjun Qian , Kai Su , Qishuai Diao , Xiangyang Xia , Chang Liu , Wenfei Yang , Tianzhu Zhang , Zehuan Yuan

Dual Prompting Image Restoration with Diffusion Transformers

Recent state-of-the-art image restoration methods mostly adopt latent diffusion models with U-Net backbones, yet still facing challenges in achieving high-quality restoration due to their limited capabilities. Diffusion transformers (DiTs),…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Dehong Kong , Fan Li , Zhixin Wang , Jiaqi Xu , Renjing Pei , Wenbo Li , WenQi Ren

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

In this work, we empirically study Diffusion Transformers (DiTs) for text-to-image generation, focusing on architectural choices, text-conditioning strategies, and training protocols. We evaluate a range of DiT-based…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Chen Chen , Rui Qian , Wenze Hu , Tsu-Jui Fu , Jialing Tong , Xinze Wang , Lezhi Li , Bowen Zhang , Alex Schwing , Wei Liu , Yinfei Yang

GenCompositor: Generative Video Compositing with Diffusion Transformer

Video compositing combines live-action footage to create video production, serving as a crucial technique in video creation and film production. Traditional pipelines require intensive labor efforts and expert collaboration, resulting in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Shuzhou Yang , Xiaoyu Li , Xiaodong Cun , Guangzhi Wang , Lingen Li , Ying Shan , Jian Zhang

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

Layout generation is a foundation task of graphic design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Yu Li , Yifan Chen , Gongye Liu , Fei Yin , Qingyan Bai , Jie Wu , Hongfa Wang , Ruihang Chu , Yujiu Yang

Dual Diffusion for Unified Image Generation and Understanding

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang

ControlMat: A Controlled Generative Approach to Material Capture

Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks.…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Giuseppe Vecchio , Rosalie Martin , Arthur Roullier , Adrien Kaiser , Romain Rouffet , Valentin Deschaintre , Tamy Boubekeur

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Dongting Hu , Aarush Gupta , Magzhan Gabidolla , Arpit Sahni , Huseyin Coskun , Yanyu Li , Yerlan Idelbayev , Ahsan Mahmood , Aleksei Lebedev , Dishani Lahiri , Anujraaj Goyal , Ju Hu , Mingming Gong , Sergey Tulyakov , Anil Kag

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important for effective product presentation. However, most existing frameworks either fail to preserve the identities of both humans and…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Lizhen Wang , Zhurong Xia , Tianshu Hu , Pengrui Wang , Pengfei Wei , Zerong Zheng , Ming Zhou , Yuan Zhang , Mingyuan Gao

Material Anything: Generating Materials for Any 3D Object via Diffusion

We present Material Anything, a fully-automated, unified diffusion framework designed to generate physically-based materials for 3D objects. Unlike existing methods that rely on complex pipelines or case-specific optimizations, Material…

Computer Vision and Pattern Recognition · Computer Science 2024-11-25 Xin Huang , Tengfei Wang , Ziwei Liu , Qing Wang

DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models

In this study, we aim to enhance the capabilities of diffusion-based text-to-image (T2I) generation models by integrating diverse modalities beyond textual descriptions within a unified framework. To this end, we categorize widely used…

Computer Vision and Pattern Recognition · Computer Science 2025-08-27 Sungnyun Kim , Junsoo Lee , Kibeom Hong , Daesik Kim , Namhyuk Ahn

Advance Fake Video Detection via Vision Transformers

Recent advancements in AI-based multimedia generation have enabled the creation of hyper-realistic images and videos, raising concerns about their potential use in spreading misinformation. The widespread accessibility of generative…

Computer Vision and Pattern Recognition · Computer Science 2025-04-30 Joy Battocchio , Stefano Dell'Anna , Andrea Montibeller , Giulia Boato

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

The correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Ruofan Liang , Zan Gojcic , Merlin Nimier-David , David Acuna , Nandita Vijaykumar , Sanja Fidler , Zian Wang

Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration

We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. However, conventional…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Haoran Bai , Xiaoxu Chen , Canqian Yang , Zongyao He , Sibin Deng , Ying Chen

MatMart: Material Reconstruction of 3D Objects via Diffusion

Applying diffusion models to physically-based material estimation and generation has recently gained prominence. In this paper, we propose \ttt, a novel material reconstruction framework for 3D objects, offering the following advantages.…

Graphics · Computer Science 2025-11-25 Xiuchao Wu , Pengfei Zhu , Jiangjing Lyu , Xinguo Liu , Jie Guo , Yanwen Guo , Weiwei Xu , Chengfei Lyu