Related papers: LayerSync: Self-aligning Intermediate Layers

No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models

Efficient training strategies for large-scale diffusion models have recently emphasized the importance of improving discriminative feature representations in these models. A central line of work in this direction is representation alignment…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Junno Yun , Yaşar Utku Alçalar , Mehmet Akçakaya

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

Learning Diffusion Models with Flexible Representation Guidance

Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of the diffusion model with those of…

Machine Learning · Computer Science 2025-10-14 Chenyu Wang , Cai Zhou , Sharut Gupta , Zongyu Lin , Stefanie Jegelka , Stephen Bates , Tommi Jaakkola

Diffuse and Disperse: Image Generation with Representation Regularization

The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack…

Computer Vision and Pattern Recognition · Computer Science 2025-07-25 Runqian Wang , Kaiming He

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned…

Computer Vision and Pattern Recognition · Computer Science 2025-06-19 Sihyun Yu , Sangkyung Kwak , Huiwon Jang , Jongheon Jeong , Jonathan Huang , Jinwoo Shin , Saining Xie

Text2Layer: Layered Image Generation using Latent Diffusion Model

Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of…

Computer Vision and Pattern Recognition · Computer Science 2023-07-20 Xinyang Zhang , Wentian Zhao , Xin Lu , Jeff Chien

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Despite significant progress in text-to-image diffusion models, achieving precise spatial control over generated outputs remains challenging. ControlNet addresses this by introducing an auxiliary conditioning module, while ControlNet++…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Nina Konovalova , Maxim Nikolaev , Andrey Kuznetsov , Aibek Alanov

StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization

Generating a coherent sequence of images that tells a visual story, using text-to-image diffusion models, often faces the critical challenge of maintaining subject consistency across all story scenes. Existing approaches, which typically…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Gopalji Gaur , Mohammadreza Zolfaghari , Thomas Brox

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Robin Rombach , Andreas Blattmann , Dominik Lorenz , Patrick Esser , Björn Ommer

Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion

Synthetic data generation is an important application of machine learning in the field of medical imaging. While existing approaches have successfully applied fine-tuned diffusion models for synthesizing medical images, we explore potential…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Lakshmi Nair

EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift

Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency. The distribution of video content features may change over time for various reasons (i.e. light and weather change) , leading to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Peng Zhao , Runchu Dong , Guiqin Wang , Cong Zhao

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Andreas Blattmann , Robin Rombach , Huan Ling , Tim Dockhorn , Seung Wook Kim , Sanja Fidler , Karsten Kreis

CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation

We introduce a diffusion-based cross-domain image translator in the absence of paired training data. Unlike GAN-based methods, our approach integrates diffusion models to learn the image translation process, allowing for more coverable…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Shilong Zou , Yuhang Huang , Renjiao Yi , Chenyang Zhu , Kai Xu

DiffHarmony: Latent Diffusion Model Meets Image Harmonization

Image harmonization, which involves adjusting the foreground of a composite image to attain a unified visual consistency with the background, can be conceptualized as an image-to-image translation task. Diffusion models have recently…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Pengfei Zhou , Fangxiang Feng , Xiaojie Wang

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

Diffusion models have recently advanced photorealistic human synthesis, although practical talking-head generation (THG) remains constrained by high inference latency, temporal instability such as flicker and identity drift, and imperfect…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Soumya Mazumdar , Vineet Kumar Rakesh

DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode

Text-driven image generation using diffusion models has recently gained significant attention. To enable more flexible image manipulation and editing, recent research has expanded from single image generation to transparent layer generation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Junjia Huang , Pengxiang Yan , Jinhang Cai , Jiyang Liu , Zhao Wang , Yitong Wang , Xinglong Wu , Guanbin Li

LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image Generation

Flow matching and diffusion models have shown impressive results in text-to-image generation, producing photorealistic images through an iterative denoising process. A common strategy to speed up synthesis is to perform early denoising at…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Jyun-Ze Tang , Chih-Fan Hsu , Jeng-Lin Li , Ming-Ching Chang , Wei-Chao Chen

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pengzhi Li , QInxuan Huang , Yikang Ding , Zhiheng Li

Training-Free Representation Guidance for Diffusion Models with a Representation Alignment Projector

Recent progress in generative modeling has enabled high-quality visual synthesis with diffusion-based frameworks, supporting controllable sampling and large-scale training. Inference-time guidance methods such as classifier-free and…

Computer Vision and Pattern Recognition · Computer Science 2026-02-02 Wenqiang Zu , Shenghao Xie , Bo Lei , Lei Ma