English
Related papers

Related papers: GriDiT: Factorized Grid-Based Diffusion for Effici…

200 papers

Diffusion models with their powerful expressivity and high sample quality have achieved State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision Transformer (ViT) has also demonstrated strong modeling capabilities…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Ali Hatamizadeh , Jiaming Song , Guilin Liu , Jan Kautz , Arash Vahdat

Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus on more common categories. In large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Kun Wang , Donglin Di , Tonghua Su , Lei Fan

Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Zeqi Gu , Ethan Yang , Abe Davis

Lately, there has been a surge in interest surrounding generative modeling of time series data. Most existing approaches are designed either to process short sequences or to handle long-range sequences. This dichotomy can be attributed to…

Machine Learning · Computer Science 2024-10-28 Ilan Naiman , Nimrod Berman , Itai Pemper , Idan Arbiv , Gal Fadlon , Omri Azencot

We introduce a framework for joint grounded scene graph - image generation, a challenging task involving high-dimensional, multi-modal structured data. To effectively model this complex joint distribution, we adopt a factorized approach:…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Bicheng Xu , Qi Yan , Renjie Liao , Lele Wang , Leonid Sigal

Visual generation has witnessed remarkable progress in single-image tasks, yet extending these capabilities to temporal sequences remains challenging. Current approaches either build specialized video models from scratch with enormous…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Cong Wan , Xiangyang Luo , Hao Luo , Zijian Cai , Yiren Song , Yunlong Zhao , Yifan Bai , Fan Wang , Yuhang He , Yihong Gong

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details. This limitation stems from the usage of class priors, which only provide coarse and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Xiaoyu Yue , Zidong Wang , Zeyu Lu , Shuyang Sun , Meng Wei , Wanli Ouyang , Lei Bai , Luping Zhou

Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Shekoofeh Azizi , Simon Kornblith , Chitwan Saharia , Mohammad Norouzi , David J. Fleet

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Yi Tang , Peng Sun , Zhenglin Cheng , Tao Lin

The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ziying Pan , Kun Wang , Gang Li , Feihong He , Yongxuan Lai

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Wangbo Zhao , Yizeng Han , Jiasheng Tang , Kai Wang , Hao Luo , Yibing Song , Gao Huang , Fan Wang , Yang You

Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Samar Khanna , Patrick Liu , Linqi Zhou , Chenlin Meng , Robin Rombach , Marshall Burke , David Lobell , Stefano Ermon

Iterative denoising-based generation, also known as denoising diffusion models, has recently been shown to be comparable in quality to other classes of generative models, and even surpass them. Including, in particular, Generative…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Yaniv Benny , Lior Wolf

The diffusion transformer (DiT) architecture has attracted significant attention in image generation, achieving better fidelity, performance, and diversity. However, most existing DiT - based image generation methods focus on global - aware…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Zhen Xiong , Yuqi Li , Chuanguang Yang , Tiao Tan , Zhihong Zhu , Siyuan Li , Yue Ma

Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset distribution. As a result, they capture only the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Zerun Wang , Jiafeng Mao , Xueting Wang , Toshihiko Yamasaki

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Jiarui Fang , Jinzhe Pan , Xibo Sun , Aoyu Li , Jiannan Wang

Imaging under extremely low-light conditions presents a significant challenge and is an ill-posed problem due to the low signal-to-noise ratio (SNR) caused by minimal photon capture. Previously, diffusion models have been used for multiple…

Image and Video Processing · Electrical Eng. & Systems 2024-03-01 Rishit Dagli

Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images…

Computer Vision and Pattern Recognition · Computer Science 2024-08-16 Tao Huang , Jiaqi Liu , Shan You , Chang Xu

While diffusion-based generative models have made significant strides in visual content creation, conventional approaches face computational challenges, especially for high-resolution images, as they denoise the entire image from noisy…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Haohang Xu , Longyu Chen , Yichen Zhang , Shuangrui Ding , Zhipeng Zhang
‹ Prev 1 2 3 10 Next ›