Related papers: Visual Bridge: Universal Visual Perception Represe…

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Recent progress in diffusion models significantly advances various image generation tasks. However, the current mainstream approach remains focused on building task-specific models, which have limited efficiency when supporting a wide range…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Zhong-Yu Li , Ruoyi Du , Juncheng Yan , Le Zhuo , Qilong Wu , Zhen Li , Peng Gao , Zhanyu Ma , Ming-Ming Cheng

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

This paper's primary objective is to develop a robust generalist perception model capable of addressing multiple tasks under constraints of computational resources and limited training data. We leverage text-to-image diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Canyu Zhao , Yanlong Sun , Mingyu Liu , Huanyi Zheng , Muzhi Zhu , Zhiyue Zhao , Hao Chen , Tong He , Chunhua Shen

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation

We present UniModel, a unified generative model that jointly supports visual understanding and visual generation within a single pixel-to-pixel diffusion framework. Our goal is to achieve unification along three axes: the model, the tasks,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Chi Zhang , Jiepeng Wang , Youming Wang , Yuanzhi Liang , Xiaoyan Yang , Zuoxin Li , Haibin Huang , Xuelong Li

Textualize Visual Prompt for Image Editing via Diffusion Bridge

Visual prompt, a pair of before-and-after edited images, can convey indescribable imagery transformations and prosper in image editing. However, current visual prompt methods rely on a pretrained text-guided image-to-image generative model…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Pengcheng Xu , Qingnan Fan , Fei Kou , Shuai Qin , Hong Gu , Ruoyu Zhao , Charles Ling , Boyu Wang

Scaling Properties of Diffusion Models for Perceptual Tasks

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Rahul Ravishankar , Zeeshan Patel , Jathushan Rajasegaran , Jitendra Malik

Dual Diffusion for Unified Image Generation and Understanding

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific fine-tuning paradigm makes them inconsistent with the goal of general perception modeling. The key to eliminating this inconsistency is to use generalist models for…

Computer Vision and Pattern Recognition · Computer Science 2022-11-18 Hao Li , Jinguo Zhu , Xiaohu Jiang , Xizhou Zhu , Hongsheng Li , Chun Yuan , Xiaohua Wang , Yu Qiao , Xiaogang Wang , Wenhai Wang , Jifeng Dai

Diffusion-based Visual Anagram as Multi-task Learning

Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Zhiyuan Xu , Yinhe Chen , Huan-ang Gao , Weiyan Zhao , Guiyu Zhang , Hao Zhao

One Diffusion to Generate Them All

We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Duong H. Le , Tuan Pham , Sangho Lee , Christopher Clark , Aniruddha Kembhavi , Stephan Mandt , Ranjay Krishna , Jiasen Lu

Unifying Visual Perception by Dispersible Points Learning

We present a conceptually simple, flexible, and universal visual perception head for variant visual tasks, e.g., classification, object detection, instance segmentation and pose estimation, and different frameworks, such as one-stage or…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Jianming Liang , Guanglu Song , Biao Leng , Yu Liu

VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis

Creating novel images by fusing visual cues from multiple sources is a fundamental yet underexplored problem in image-to-image generation, with broad applications in artistic creation, virtual reality and visual media. Existing methods…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Zeren Xiong , Yue Yu , Zedong Zhang , Shuo Chen , Jian Yang , Jun Li

From Image to Video: An Empirical Study of Diffusion Representations

Diffusion models have revolutionized generative modeling, enabling unprecedented realism in image and video synthesis. This success has sparked interest in leveraging their representations for visual understanding tasks. While recent works…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Pedro Vélez , Luisa F. Polanía , Yi Yang , Chuhan Zhang , Rishabh Kabra , Anurag Arnab , Mehdi S. M. Sajjadi

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Yue Fan , Yongqin Xian , Xiaohua Zhai , Alexander Kolesnikov , Muhammad Ferjad Naeem , Bernt Schiele , Federico Tombari

Diffusion Models in 3D Vision: A Survey

In recent years, 3D vision has become a crucial field within computer vision, powering a wide range of applications such as autonomous driving, robotics, augmented reality, and medical imaging. This field relies on accurate perception,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Zhen Wang , Dongyuan Li , Yaozu Wu , Tianyu He , Jiang Bian , Renhe Jiang

Diffusion Models in Low-Level Vision: A Survey

Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising…

Computer Vision and Pattern Recognition · Computer Science 2025-02-26 Chunming He , Yuqi Shen , Chengyu Fang , Fengyang Xiao , Longxiang Tang , Yulun Zhang , Wangmeng Zuo , Zhenhua Guo , Xiu Li

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Shiyin Dong , Mingrui Zhu , Kun Cheng , Nannan Wang , Xinbo Gao

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Beyond high-fidelity image synthesis, diffusion models have recently exhibited promising results in dense visual perception tasks. However, most existing work treats diffusion models as a standalone component for perception tasks, employing…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Shuhong Zheng , Zhipeng Bao , Ruoyu Zhao , Martial Hebert , Yu-Xiong Wang

Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks

Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent…

Robotics · Computer Science 2024-12-10 Pranjali Pathre , Gunjan Gupta , M. Nomaan Qureshi , Mandyam Brunda , Samarth Brahmbhatt , K. Madhava Krishna

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in…

Computer Vision and Pattern Recognition · Computer Science 2024-03-01 Xianghui Yang , Yan Zuo , Sameera Ramasinghe , Loris Bazzani , Gil Avraham , Anton van den Hengel

Growing Visual Generative Capacity for Pre-Trained MLLMs

Multimodal large language models (MLLMs) extend the success of language models to visual understanding, and recent efforts have sought to build unified MLLMs that support both understanding and generation. However, constructing such models…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Hanyu Wang , Jiaming Han , Ziyan Yang , Qi Zhao , Shanchuan Lin , Xiangyu Yue , Abhinav Shrivastava , Zhenheng Yang , Hao Chen