Related papers: MacDiff: Unified Skeleton Modeling with Masked Con…

PointDico: Contrastive 3D Representation Learning Guided by Diffusion Models

Self-supervised representation learning has shown significant improvement in Natural Language Processing and 2D Computer Vision. However, existing methods face difficulties in representing 3D data because of its unordered and uneven…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Pengbo Li , Yiding Sun , Haozhe Cheng

Search-Augmented Masked Diffusion Models for Constrained Generation

Discrete diffusion models generate sequences by iteratively denoising samples corrupted by categorical noise, offering an appealing alternative to autoregressive decoding for structured and symbolic generation. However, standard training…

Machine Learning · Computer Science 2026-02-04 Huu Binh Ta , Michael Cardei , Alvaro Velasquez , Ferdinando Fioretto

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e.g., image-text semantic alignment) and image synthesis (e.g., text-to-image generation). On the other hand,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Xiao Dong , Runhui Huang , Xiaoyong Wei , Zequn Jie , Jianxing Yu , Jian Yin , Xiaodan Liang

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

Diffusion-based generative models have shown promise in synthesizing histopathology images to address data scarcity caused by privacy constraints. Diagnostic text reports provide high-level semantic descriptions, and masks offer…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Mahesh Bhosale , Abdul Wasi , Yuanhao Zhai , Yunjie Tian , Samuel Border , Nan Xi , Pinaki Sarder , Junsong Yuan , David Doermann , Xuan Gong

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation

Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Yasufumi Kawano , Yoshimitsu Aoki

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most…

Computer Vision and Pattern Recognition · Computer Science 2025-08-01 Giuseppe Cartella , Vittorio Cuculo , Alessandro D'Amelio , Marcella Cornia , Giuseppe Boccignone , Rita Cucchiara

Masked Diffusion as Self-supervised Representation Learner

Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and have been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Zixuan Pan , Jianxu Chen , Yiyu Shi

MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning

As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample. While contrastive learning has yielded continuous advancements in sampling strategy…

Machine Learning · Computer Science 2023-08-11 Jiangmeng Li , Wenwen Qiang , Yanan Zhang , Wenyi Mo , Changwen Zheng , Bing Su , Hui Xiong

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images. Most existing methods address this…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Ling Yang , Zhilin Huang , Yang Song , Shenda Hong , Guohao Li , Wentao Zhang , Bin Cui , Bernard Ghanem , Ming-Hsuan Yang

StructDiff: Structure-aware Diffusion Model for 3D Fine-grained Medical Image Synthesis

Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing…

Image and Video Processing · Electrical Eng. & Systems 2025-12-19 Jiahao Xia , Yutao Hu , Yaolei Qi , Zhenliang Li , Wenqi Shao , Junjun He , Ying Fu , Longjiang Zhang , Guanyu Yang

Masked Conditional Diffusion Model for Enhancing Deepfake Detection

Recent studies on deepfake detection have achieved promising results when training and testing faces are from the same dataset. However, their results severely degrade when confronted with forged samples that the model has not yet seen…

Computer Vision and Pattern Recognition · Computer Science 2024-02-02 Tiewen Chen , Shanmin Yang , Shu Hu , Zhenghan Fang , Ying Fu , Xi Wu , Xin Wang

Unified Auto-Encoding with Masked Diffusion

At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled…

Computer Vision and Pattern Recognition · Computer Science 2024-06-26 Philippe Hansen-Estruch , Sriram Vishwanath , Amy Zhang , Manan Tomar

Mask prior-guided denoising diffusion improves inverse protein folding

Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing strong potential and competitive performance. However, challenges remain, such as…

Biomolecules · Quantitative Biology 2025-07-29 Peizhen Bai , Filip Miljković , Xianyuan Liu , Leonardo De Maria , Rebecca Croasdale-Wood , Owen Rackham , Haiping Lu

Dynamic Entity-Masked Graph Diffusion Model for histopathological image Representation Learning

Significant disparities between the features of natural images and those inherent to histopathological images make it challenging to directly apply and transfer pre-trained models from natural images to histopathology tasks. Moreover, the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Zhenfeng Zhuang , Min Cen , Yanfeng Li , Fangyu Zhou , Lequan Yu , Baptiste Magnier , Liansheng Wang

MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Changlu Guo , Anders Nymark Christensen , Anders Bjorholm Dahl , Morten Rieger Hannemose

Realistic Human Motion Generation with Cross-Diffusion Models

We introduce the Cross Human Motion Diffusion Model (CrossDiff), a novel approach for generating high-quality human motion based on textual descriptions. Our method integrates 3D and 2D information using a shared transformer network within…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Zeping Ren , Shaoli Huang , Xiu Li

MAEDiff: Masked Autoencoder-enhanced Diffusion Models for Unsupervised Anomaly Detection in Brain Images

Unsupervised anomaly detection has gained significant attention in the field of medical imaging due to its capability of relieving the costly pixel-level annotation. To achieve this, modern approaches usually utilize generative models to…

Image and Video Processing · Electrical Eng. & Systems 2024-01-22 Rui Xu , Yunke Wang , Bo Du

Text-driven Human Motion Generation with Motion Masked Diffusion Model

Text-driven human motion generation is a multimodal task that synthesizes human motion sequences conditioned on natural language. It requires the model to satisfy textual descriptions under varying conditional inputs, while generating…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Xingyu Chen

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning…

Machine Learning · Computer Science 2024-01-09 Baoquan Zhang , Chuyao Luo , Demin Yu , Huiwei Lin , Xutao Li , Yunming Ye , Bowen Zhang

Masked Diffusion for Generative Recommendation

Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semantic information provided through language model…

Machine Learning · Computer Science 2025-12-19 Kulin Shah , Bhuvesh Kumar , Neil Shah , Liam Collins