Related papers: Cognitively Inspired Cross-Modal Data Generation U…

Diffusion Models For Multi-Modal Generative Modeling

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Changyou Chen , Han Ding , Bunyamin Sisman , Yi Xu , Ouye Xie , Benjamin Z. Yao , Son Dinh Tran , Belinda Zeng

Conditional Image Generation with Pretrained Generative Model

In recent years, diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. However, like any other large generative models, these models require a huge amount of data,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Rajesh Shrestha , Bowen Xie

A Simple Approach to Unifying Diffusion-based Conditional Generation

Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation. Instead of proposing another specialized…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Xirui Li , Charles Herrmann , Kelvin C. K. Chan , Yinxiao Li , Deqing Sun , Chao Ma , Ming-Hsuan Yang

Diffusion Models with Double Guidance: Generate with aggregated datasets

Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a…

Machine Learning · Statistics 2026-03-31 Yanfeng Yang , Kenji Fukumizu

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further…

Computer Vision and Pattern Recognition · Computer Science 2023-04-21 Ziqi Huang , Kelvin C. K. Chan , Yuming Jiang , Ziwei Liu

DiffX: Guide Your Layout to Cross-Modal Generative Modeling

Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Zeyu Wang , Jingyu Lin , Yifei Qian , Yi Huang , Shicen Tian , Bosong Chai , Juncan Deng , Qu Yang , Lan Du , Cunjian Chen , Kejie Huang

Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces

Diffusion models have demonstrated remarkable performance in generating unimodal data across various tasks, including image, video, and text generation. On the contrary, the joint generation of multimodal data through diffusion models is…

Machine Learning · Computer Science 2025-06-16 Kevin Rojas , Yuchen Zhu , Sichen Zhu , Felix X. -F. Ye , Molei Tao

Diffusion idea exploration for art generation

Cross-Modal learning tasks have picked up pace in recent times. With plethora of applications in diverse areas, generation of novel content using multiple modalities of data has remained a challenging problem. To address the same, various…

Computer Vision and Pattern Recognition · Computer Science 2023-07-12 Nikhil Verma

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Nithesh Chandher Karthikeyan , Jonas Unger , Gabriel Eilertsen

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

Discriminative Multimodal Learning via Conditional Priors in Generative Models

Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can…

Machine Learning · Computer Science 2023-01-24 Rogelio A. Mancisidor , Michael Kampffmeyer , Kjersti Aas , Robert Jenssen

Diffusion Models as Data Mining Tools

This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. Our insight is that since contemporary generative models learn an accurate representation of their training data, we can use…

Computer Vision and Pattern Recognition · Computer Science 2024-08-07 Ioannis Siglidis , Aleksander Holynski , Alexei A. Efros , Mathieu Aubry , Shiry Ginosar

Multi-modal Latent Diffusion

Multi-modal data-sets are ubiquitous in modern applications, and multi-modal Variational Autoencoders are a popular family of models that aim to learn a joint representation of the different modalities. However, existing approaches suffer…

Machine Learning · Computer Science 2023-12-19 Mustapha Bounoua , Giulio Franzese , Pietro Michiardi

Diffusion Active Learning: Towards Data-Driven Experimental Design in Computed Tomography

We introduce Diffusion Active Learning, a novel approach that combines generative diffusion modeling with data-driven sequential experimental design to adaptively acquire data for inverse problems. Although broadly applicable, we focus on…

Machine Learning · Computer Science 2025-04-07 Luis Barba , Johannes Kirschner , Tomas Aidukas , Manuel Guizar-Sicairos , Benjamín Béjar

Cognitively Inspired Learning of Incremental Drifting Concepts

Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input…

Machine Learning · Computer Science 2023-04-24 Mohammad Rostami , Aram Galstyan

Input-Adaptive Generative Dynamics in Diffusion Models

Diffusion models typically generate data through a fixed denoising trajectory that is shared across all samples. However, generation targets can differ in complexity, suggesting that a single pre-defined diffusion process may not be optimal…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yucheng Xing , Xiaodong Liu , Xin Wang

Is Conditional Generative Modeling all you need for Decision-Making?

Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential…

Machine Learning · Computer Science 2023-07-11 Anurag Ajay , Yilun Du , Abhi Gupta , Joshua Tenenbaum , Tommi Jaakkola , Pulkit Agrawal

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality…

Image and Video Processing · Electrical Eng. & Systems 2024-04-11 Zihao Wang , Yingyu Yang , Yuzhou Chen , Tingting Yuan , Maxime Sermesant , Herve Delingette , Ona Wu

Generative-based Fusion Mechanism for Multi-Modal Tracking

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Zhangyong Tang , Tianyang Xu , Xuefeng Zhu , Xiao-Jun Wu , Josef Kittler

MMGen: Unified Multi-modal Image Generation and Understanding in One Go

A unified diffusion framework for multi-modal generation and understanding has the transformative potential to achieve seamless and controllable image diffusion and other cross-modal tasks. In this paper, we introduce MMGen, a unified…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Jiepeng Wang , Zhaoqing Wang , Hao Pan , Yuan Liu , Dongdong Yu , Changhu Wang , Wenping Wang