Related papers: GriDiT: Factorized Grid-Based Diffusion for Effici…

DiffiT: Diffusion Vision Transformers for Image Generation

Diffusion models with their powerful expressivity and high sample quality have achieved State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision Transformer (ViT) has also demonstrated strong modeling capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Ali Hatamizadeh , Jiaming Song , Guilin Liu , Jan Kautz , Arash Vahdat

EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models

Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus on more common categories. In large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Kun Wang , Donglin Di , Tonghua Su , Lei Fan

Filter-Guided Diffusion for Controllable Image Generation

Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Zeqi Gu , Ethan Yang , Abe Davis

Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series

Lately, there has been a surge in interest surrounding generative modeling of time series data. Most existing approaches are designed either to process short sequences or to handle long-range sequences. This dichotomy can be attributed to…

Machine Learning · Computer Science 2024-10-28 Ilan Naiman , Nimrod Berman , Itai Pemper , Idan Arbiv , Gal Fadlon , Omri Azencot

Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models

We introduce a framework for joint grounded scene graph - image generation, a challenging task involving high-dimensional, multi-modal structured data. To effectively model this complex joint distribution, we adopt a factorized approach:…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Bicheng Xu , Qi Yan , Renjie Liao , Lele Wang , Leonid Sigal

Grid: Omni Visual Generation

Visual generation has witnessed remarkable progress in single-image tasks, yet extending these capabilities to temporal sequences remains challenging. Current approaches either build specialized video models from scratch with enormous…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Cong Wan , Xiangyang Luo , Hao Luo , Zijian Cai , Yiren Song , Yunlong Zhao , Yifan Bai , Fan Wang , Yuhang He , Yihong Gong

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

Diffusion Models Need Visual Priors for Image Generation

Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Xiaoyu Yue , Zidong Wang , Zeyu Lu , Shuyang Sun , Meng Wei , Wanli Ouyang , Lei Bai , Luping Zhou

Synthetic Data from Diffusion Models Improves ImageNet Classification

Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Shekoofeh Azizi , Simon Kornblith , Chitwan Saharia , Mohammad Norouzi , David J. Fleet

GMem: A Modular Approach for Ultra-Efficient Generative Models

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Yi Tang , Peng Sun , Zhenglin Cheng , Tao Lin

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ziying Pan , Kun Wang , Gang Li , Feihong He , Yongxuan Lai

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

DiffusionSat: A Generative Foundation Model for Satellite Imagery

Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Samar Khanna , Patrick Liu , Linqi Zhou , Chenlin Meng , Robin Rombach , Marshall Burke , David Lobell , Stefano Ermon

Dynamic Dual-Output Diffusion Models

Iterative denoising-based generation, also known as denoising diffusion models, has recently been shown to be comparable in quality to other classes of generative models, and even surpass them. Including, in particular, Generative…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Yaniv Benny , Lior Wolf

Enhancing Image Generation Fidelity via Progressive Prompts

The diffusion transformer (DiT) architecture has attracted significant attention in image generation, achieving better fidelity, performance, and diversity. However, most existing DiT - based image generation methods focus on global - aware…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Zhen Xiong , Yuqi Li , Chuanguang Yang , Tiao Tan , Zhihong Zhu , Siyuan Li , Yue Ma

Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data

Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset distribution. As a result, they capture only the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Zerun Wang , Jiafeng Mao , Xueting Wang , Toshihiko Yamasaki

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Jiarui Fang , Jinzhe Pan , Xibo Sun , Aoyu Li , Jiannan Wang

DiffuseRAW: End-to-End Generative RAW Image Processing for Low-Light Images

Imaging under extremely low-light conditions presents a significant challenge and is an ill-posed problem due to the low signal-to-noise ratio (SNR) caused by minimal photon capture. Previously, diffusion models have been used for multiple…

Image and Video Processing · Electrical Eng. & Systems 2024-03-01 Rishit Dagli

Active Generation for Image Classification

Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images…

Computer Vision and Pattern Recognition · Computer Science 2024-08-16 Tao Huang , Jiaqi Liu , Shan You , Chang Xu

MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize

While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Haohang Xu , Longyu Chen , Yichen Zhang , Shuangrui Ding , Zhipeng Zhang