Related papers: Latent Beam Diffusion Models for Generating Visual…

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

Recent advances in image generation have made diffusion models powerful tools for creating high-quality images. However, their iterative denoising process makes understanding and interpreting their semantic latent spaces more challenging…

Computation and Language · Computer Science 2024-11-06 E. Zhixuan Zeng , Yuhao Chen , Alexander Wong

Nested Diffusion Models Using Hierarchical Latent Priors

We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Xiao Zhang , Ruoxi Jiang , Rebecca Willett , Michael Maire

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Robin Rombach , Andreas Blattmann , Dominik Lorenz , Patrick Esser , Björn Ommer

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

Optical Diffusion Models for Image Generation

Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output,…

Optics · Physics 2024-11-01 Ilker Oguz , Niyazi Ulas Dinc , Mustafa Yildirim , Junjie Ke , Innfarn Yoo , Qifei Wang , Feng Yang , Christophe Moser , Demetri Psaltis

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

The remarkable progress in text-to-video diffusion models enables the generation of photorealistic videos, although the content of these generated videos often includes unnatural movement or deformation, reverse playback, and motionless…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Yuta Oshima , Masahiro Suzuki , Yutaka Matsuo , Hiroki Furuta

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent…

Computation and Language · Computer Science 2026-05-11 Viacheslav Meshchaninov , Alexander Shabalin , Egor Chimbulatov , Nikita Gushchin , Ilya Koziev , Alexander Korotin , Dmitry Vetrov

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

Move Anything with Layered Scene Diffusion

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Jiawei Ren , Mengmeng Xu , Jui-Chieh Wu , Ziwei Liu , Tao Xiang , Antoine Toisoul

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

Nowadays, scene text recognition has attracted more and more attention due to its diverse applications. Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Xiaomeng Yang , Zhi Qiao , Yu Zhou

Latent Diffusion for Language Generation

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

While inference-time scaling through search has revolutionized Large Language Models, translating these gains to image generation has proven difficult. Recent attempts to apply search strategies to continuous diffusion models show limited…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Erik Riise , Mehmet Onurcan Kaya , Dim P. Papadopoulos

Diffusion Models in Low-Level Vision: A Survey

Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising…

Computer Vision and Pattern Recognition · Computer Science 2025-02-26 Chunming He , Yuqi Shen , Chengyu Fang , Fengyang Xiao , Longxiang Tang , Yulun Zhang , Wangmeng Zuo , Zhenhua Guo , Xiu Li

Enhancing Image Layout Control with Loss-Guided Diffusion Models

Diffusion models are a powerful class of generative models capable of producing high-quality images from pure noise using a simple text prompt. While most methods which introduce additional spatial constraints into the generated images…

Computer Vision and Pattern Recognition · Computer Science 2024-09-18 Zakaria Patel , Kirill Serkh

Context Diffusion: In-Context Aware Image Generation

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is…

Computer Vision and Pattern Recognition · Computer Science 2025-07-24 Ivona Najdenkoska , Animesh Sinha , Abhimanyu Dubey , Dhruv Mahajan , Vignesh Ramanathan , Filip Radenovic

Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models

Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe, helping fashion designers in real-time visualization by giving them a basic customized structure of how a specific design preference would…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Krishna Sri Ipsit Mantri , Nevasini Sasikumar

Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation

Latent diffusion models excel at generating high-quality images but lose the benefits of end-to-end modeling. They discard information during image encoding, require a separately trained decoder, and model an auxiliary distribution to the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-13 Alan Baade , Eric Ryan Chan , Kyle Sargent , Changan Chen , Justin Johnson , Ehsan Adeli , Li Fei-Fei

Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models

Image generation models trained on large datasets can synthesize high-quality images but often produce spatially inconsistent and distorted images due to limited information about the underlying structures and spatial layouts. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Hyundo Lee , Suhyung Choi , Inwoo Hwang , Byoung-Tak Zhang

Localized Control in Diffusion Models via Latent Vector Prediction

Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a desired image solely through text remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Pablo Domingo-Gregorio , Javier Ruiz-Hidalgo