Related papers: DiffAugment: Diffusion based Long-Tailed Visual Re…

Generalized Visual Relation Detection with Diffusion Models

Visual relation detection (VRD) aims to identify relationships (or interactions) between object pairs in an image. Although recent VRD models have achieved impressive performance, they are all restricted to pre-defined relation categories,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Kaifeng Gao , Siqi Chen , Hanwang Zhang , Jun Xiao , Yueting Zhuang , Qianru Sun

Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions

The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2024-05-16 Tianxu Wu , Shuo Ye , Shuhuang Chen , Qinmu Peng , Xinge You

Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition

We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals,…

Computer Vision and Pattern Recognition · Computer Science 2019-11-25 Mohammed Haroon Dupty , Zhen Zhang , Wee Sun Lee

Knowledge-augmented Few-shot Visual Relation Detection

Visual Relation Detection (VRD) aims to detect relationships between objects for image understanding. Most existing VRD methods rely on thousands of training samples of each relationship to achieve satisfactory performance. Some recent…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Tianyu Yu , Yangning Li , Jiaoyan Chen , Yinghui Li , Hai-Tao Zheng , Xi Chen , Qingbin Liu , Wenqiang Liu , Dongxiao Huang , Bei Wu , Yexin Wang

DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers

We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Chandramouli Sastry , Sri Harsha Dumpala , Sageev Oore

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply…

Computer Vision and Pattern Recognition · Computer Science 2019-02-18 Nikolaos Gkanatsios , Vassilis Pitsikalis , Petros Koutras , Athanasia Zlatintsi , Petros Maragos

Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On

Diffusion models have shown preliminary success in virtual try-on (VTON) task. The typical dual-branch architecture comprises two UNets for implicit garment deformation and synthesized image generation respectively, and has emerged as the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-23 Siqi Wan , Jingwen Chen , Yingwei Pan , Ting Yao , Tao Mei

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

The visual relationship recognition (VRR) task aims at understanding the pairwise visual relationships between interacting objects in an image. These relationships typically have a long-tail distribution due to their compositional nature.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Jun Chen , Aniket Agarwal , Sherif Abdelkarim , Deyao Zhu , Mohamed Elhoseiny

Large-scale Reinforcement Learning for Diffusion Models

Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-01-24 Yinan Zhang , Eric Tzeng , Yilun Du , Dmitry Kislyuk

DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition

This paper proposes a new pipeline for long-tail (LT) recognition. Instead of re-weighting or re-sampling, we utilize the long-tailed dataset itself to generate a balanced proxy that can be optimized through cross-entropy (CE).…

Computer Vision and Pattern Recognition · Computer Science 2024-03-11 Jie Shao , Ke Zhu , Hanxiao Zhang , Jianxin Wu

Latent-based Diffusion Model for Long-tailed Recognition

Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Pengxiao Han , Changkun Ye , Jieming Zhou , Jing Zhang , Jie Hong , Xuesong Li

Feedback Efficient Online Fine-Tuning of Diffusion Models

Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example,…

Machine Learning · Computer Science 2024-07-19 Masatoshi Uehara , Yulai Zhao , Kevin Black , Ehsan Hajiramezanali , Gabriele Scalia , Nathaniel Lee Diamant , Alex M Tseng , Sergey Levine , Tommaso Biancalani

ReVersion: Diffusion-Based Relation Inversion from Images

Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion models from exemplar images, and existing inversion methods mainly…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Ziqi Huang , Tianxing Wu , Yuming Jiang , Kelvin C. K. Chan , Ziwei Liu

DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts

Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Zheng-Peng Duan , Jiawei zhang , Zheng Lin , Xin Jin , Dongqing Zou , Chunle Guo , Chongyi Li

Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships

One of the most difficult tasks in scene understanding is recognizing interactions between objects in an image. This task is often called visual relationship detection (VRD). We consider the question of whether, given auxiliary textual data…

Computer Vision and Pattern Recognition · Computer Science 2019-10-29 Gal Sadeh Kenigsfield , Ran El-Yaniv

DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On

Virtual try-on (VTON) aims to synthesize realistic images of a person wearing a target garment, with broad applications in e-commerce and digital fashion. While recent advances in latent diffusion models have substantially improved visual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Xiang Xu

Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

Long-tailed class imbalance remains a fundamental obstacle in semantic segmentation of high-resolution remote-sensing imagery, where dominant classes shape learned representations and rare classes are systematically under-segmented. This…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Buddhi Wijenayake , Nichula Wasalathilake , Roshan Godaliyadda , Vijitha Herath , Parakrama Ekanayake , Vishal M. Patel

VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion

Detecting visual anomalies in diverse, multi-class real-world images is a significant challenge. We introduce \ours, a novel unsupervised multi-class visual anomaly detection framework. It integrates a Latent Diffusion Model (LDM) with a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Samet Hicsonmez , Abd El Rahman Shabayek , Djamila Aouada

D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition

Applications of diffusion models for visual tasks have been quite noteworthy. This paper targets making classification models more robust to occlusions for the task of object recognition by proposing a pipeline that utilizes a frozen…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Rupayan Mallick , Sibo Dong , Nataniel Ruiz , Sarah Adel Bargal

VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

Diffusion models have emerged as powerful generative tools across various domains, yet tailoring pre-trained models to exhibit specific desirable properties remains challenging. While reinforcement learning (RL) offers a promising…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Fengyuan Dai , Zifeng Zhuang , Yufei Huang , Siteng Huang , Bangyan Liao , Donglin Wang , Fajie Yuan