Related papers: GMAIL: Generative Modality Alignment for generated…

GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

With generative models becoming increasingly sophisticated and diverse, detecting AI-generated images has become increasingly challenging. While existing AI-genereted Image detectors achieve promising performance on in-distribution…

Computer Vision and Pattern Recognition · Computer Science 2026-01-26 Haozhen Yan , Yan Hong , Suning Lang , Jiahui Zhan , Yikun Ji , Yujie Gao , Huijia Zhu , Jun Lan , Jianfu Zhang

Generative Modeling for Multi-task Visual Learning

Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Zhipeng Bao , Martial Hebert , Yu-Xiong Wang

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Zhuoran Yu , Chenchen Zhu , Sean Culatana , Raghuraman Krishnamoorthi , Fanyi Xiao , Yong Jae Lee

Semantic Granularity Metric Learning for Visual Search

Deep metric learning applied to various applications has shown promising results in identification, retrieval and recognition. Existing methods often do not consider different granularity in visual similarity. However, in many domain…

Computer Vision and Pattern Recognition · Computer Science 2021-05-17 Dipu Manandhar , Muhammet Bastan , Kim-Hui Yap

Detecting Generated Images by Fitting Natural Image Distributions

The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Yonggang Zhang , Jun Nie , Xinmei Tian , Mingming Gong , Kun Zhang , Bo Han

Ensembling with Deep Generative Views

Recent generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose, simply by learning from unlabeled image collections. Here, we investigate whether such views can be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Lucy Chai , Jun-Yan Zhu , Eli Shechtman , Phillip Isola , Richard Zhang

Generative Multi-modal Models are Good Class-Incremental Learners

In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Xusheng Cao , Haori Lu , Linlan Huang , Xialei Liu , Ming-Ming Cheng

Thinking with Generated Images

We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Ethan Chern , Zhulin Hu , Steffi Chern , Siqi Kou , Jiadi Su , Yan Ma , Zhijie Deng , Pengfei Liu

Generative Visual Instruction Tuning

We propose to use automatically generated instruction-following data to improve the zero-shot capabilities of a large multimodal model with additional support for generative and image editing tasks. We achieve this by curating a new…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Jefferson Hernandez , Ruben Villegas , Vicente Ordonez

You Only Submit One Image to Find the Most Suitable Generative Model

Deep generative models have achieved promising results in image generation, and various generative model hubs, e.g., Hugging Face and Civitai, have been developed that enable model developers to upload models and users to download models.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-18 Zhi Zhou , Lan-Zhe Guo , Peng-Xiao Song , Yu-Feng Li

CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training

We present variational generative adversarial networks, a general learning framework that combines a variational auto-encoder with a generative adversarial network, for synthesizing images in fine-grained categories, such as faces of a…

Computer Vision and Pattern Recognition · Computer Science 2018-02-06 Jianmin Bao , Dong Chen , Fang Wen , Houqiang Li , Gang Hua

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Yirui Chen , Xudong Huang , Quan Zhang , Wei Li , Mingjian Zhu , Qiangyu Yan , Simiao Li , Hanting Chen , Hailin Hu , Jie Yang , Wei Liu , Jie Hu

GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for…

Robotics · Computer Science 2024-10-29 Kyle B. Hatch , Ashwin Balakrishna , Oier Mees , Suraj Nair , Seohong Park , Blake Wulfe , Masha Itkina , Benjamin Eysenbach , Sergey Levine , Thomas Kollar , Benjamin Burchfiel

Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning

Multimodal tabular-image fusion is an emerging task that has received increasing attention in various domains. However, existing methods may be hindered by gradient conflicts between modalities, misleading the optimization of the unimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Longfei Huang , Yang Yang

SMILE: Semantically-guided Multi-attribute Image and Layout Editing

Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs). Exploring the disentangled attribute space within a transformation is a very challenging task due to the multiple…

Computer Vision and Pattern Recognition · Computer Science 2020-10-07 Andrés Romero , Luc Van Gool , Radu Timofte

Diverse Image Generation via Self-Conditioned GANs

We introduce a simple but effective unsupervised method for generating realistic and diverse images. We train a class-conditional GAN model without using manually annotated class labels. Instead, our model is conditional on labels…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Steven Liu , Tongzhou Wang , David Bau , Jun-Yan Zhu , Antonio Torralba

Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?

Multi-modal learning has emerged as an increasingly promising avenue in vision recognition, driving innovations across diverse domains ranging from media and education to healthcare and transportation. Despite its success, the robustness of…

Computer Vision and Pattern Recognition · Computer Science 2024-02-15 Tiantian Feng , Daniel Yang , Digbalay Bose , Shrikanth Narayanan

Towards Generative Class Prompt Learning for Fine-grained Visual Recognition

Although foundational vision-language models (VLMs) have proven to be very successful for various semantic discrimination tasks, they still struggle to perform faithfully for fine-grained categorization. Moreover, foundational models…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Soumitri Chattopadhyay , Sanket Biswas , Emanuele Vivoli , Josep Lladós

Zero-Shot Image Harmonization with Generative Model Prior

We propose a zero-shot approach to image harmonization, aiming to overcome the reliance on large amounts of synthetic composite images in existing methods. These methods, while showing promising results, involve significant training…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Jianqi Chen , Yilan Zhang , Zhengxia Zou , Keyan Chen , Zhenwei Shi

Detecting Generated Images by Real Images Only

As deep learning technology continues to evolve, the images yielded by generative models are becoming more and more realistic, triggering people to question the authenticity of images. Existing generated image detection methods detect…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Xiuli Bi , Bo Liu , Fan Yang , Bin Xiao , Weisheng Li , Gao Huang , Pamela C. Cosman