Related papers: Diffexplainer: Towards Cross-modal Global Explanat…

Explaining in Diffusion: Explaining a Classifier Through Hierarchical Semantics with Text-to-Image Diffusion Models

Classifiers are important components in many computer vision tasks, serving as the foundational backbone of a wide variety of models employed across diverse applications. However, understanding the decision-making process of classifiers…

Computer Vision and Pattern Recognition · Computer Science 2024-12-25 Tahira Kazimi , Ritika Allada , Pinar Yanardag

DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models

Understanding and explaining the behavior of machine learning models is essential for building transparent and trustworthy AI systems. We introduce DEXTER, a data-free framework that employs diffusion models and large language models to…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Simone Carnemolla , Matteo Pennisi , Sarinda Samarasinghe , Giovanni Bellitto , Simone Palazzo , Daniela Giordano , Mubarak Shah , Concetto Spampinato

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex structures and operations often pose challenges for non-experts to grasp. We present Diffusion…

Computation and Language · Computer Science 2024-09-04 Seongmin Lee , Benjamin Hoover , Hendrik Strobelt , Zijie J. Wang , ShengYun Peng , Austin Wright , Kevin Li , Haekyu Park , Haoyang Yang , Duen Horng Chau

Dual Diffusion for Unified Image Generation and Understanding

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to suboptimal performance in complex long-horizon,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Zefeng He , Xiaoye Qu , Yafu Li , Tong Zhu , Siyuan Huang , Yu Cheng

DiffEx: Explaining a Classifier with Diffusion Models to Identify Microscopic Cellular Variations

In recent years, deep learning models have been extensively applied to biological data across various modalities. Discriminative deep learning models have excelled at classifying images into categories (e.g., healthy versus diseased,…

Computer Vision and Pattern Recognition · Computer Science 2025-02-17 Anis Bourou , Saranga Kingkor Mahanta , Thomas Boyer , Valérie Mezger , Auguste Genovesio

Interactive Visual Learning for Stable Diffusion

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce…

Human-Computer Interaction · Computer Science 2024-04-26 Seongmin Lee , Benjamin Hoover , Hendrik Strobelt , Zijie J. Wang , ShengYun Peng , Austin Wright , Kevin Li , Haekyu Park , Haoyang Yang , Polo Chau

DiffCL: A Diffusion-Based Contrastive Learning Framework with Semantic Alignment for Multimodal Recommendations

Multimodal recommendation systems integrate diverse multimodal information into the feature representations of both items and users, thereby enabling a more comprehensive modeling of user preferences. However, existing methods are hindered…

Multimedia · Computer Science 2025-01-03 Qiya Song , Jiajun Hu , Lin Xiao , Bin Sun , Xieping Gao , Shutao Li

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Beyond high-fidelity image synthesis, diffusion models have recently exhibited promising results in dense visual perception tasks. However, most existing work treats diffusion models as a standalone component for perception tasks, employing…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Shuhong Zheng , Zhipeng Bao , Ruoyu Zhao , Martial Hebert , Yu-Xiong Wang

KnowDiffuser: A Knowledge-Guided Diffusion Planner with LLM Reasoning

Recent advancements in Language Models (LMs) have demonstrated strong semantic reasoning capabilities, enabling their application in high-level decision-making for autonomous driving (AD). However, LMs operate over discrete token spaces and…

Robotics · Computer Science 2026-04-02 Fan Ding , Xuewen Luo , Fengze Yang , Bo Yu , HwaHui Tew , Ganesh Krishnasamy , Junn Yong Loo

Conditional Diffusion Models are Medical Image Classifiers that Provide Explainability and Uncertainty for Free

Discriminative classifiers have become a foundational tool in deep learning for medical imaging, excelling at learning separable features of complex data distributions. However, these models often need careful design, augmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2025-08-11 Gian Mario Favero , Parham Saremi , Emily Kaczmarek , Brennan Nichyporuk , Tal Arbel

DiffX: Guide Your Layout to Cross-Modal Generative Modeling

Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Zeyu Wang , Jingyu Lin , Yifei Qian , Yi Huang , Shicen Tian , Bosong Chai , Juncan Deng , Qu Yang , Lan Du , Cunjian Chen , Kejie Huang

From Visual Explanations to Counterfactual Explanations with Latent Diffusion

Visual counterfactual explanations are ideal hypothetical images that change the decision-making of the classifier with high confidence toward the desired class while remaining visually plausible and close to the initial image. In this…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Tung Luu , Nam Le , Duc Le , Bac Le

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in…

Machine Learning · Computer Science 2025-12-22 Mengdan Zhu , Raasikh Kanjiani , Jiahui Lu , Andrew Choi , Qirui Ye , Liang Zhao

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Jingye Chen , Yupan Huang , Tengchao Lv , Lei Cui , Qifeng Chen , Furu Wei

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Yingying Fang , Shuang Wu , Zihao Jin , Caiwen Xu , Shiyi Wang , Simon Walsh , Guang Yang

DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Tao Liu , Chenpeng Du , Shuai Fan , Feilong Chen , Kai Yu

Do text-free diffusion models learn discriminative visual representations?

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Soumik Mukhopadhyay , Matthew Gwilliam , Yosuke Yamaguchi , Vatsal Agarwal , Namitha Padmanabhan , Archana Swaminathan , Tianyi Zhou , Jun Ohya , Abhinav Shrivastava

Explaining generative diffusion models via visual analysis for interpretable decision-making process

Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to…

Computer Vision and Pattern Recognition · Computer Science 2024-02-19 Ji-Hoon Park , Yeong-Joon Ju , Seong-Whan Lee