Related papers: Image Translation as Diffusion Visual Programmers

Unleashing Text-to-Image Diffusion Models for Visual Perception

Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Wenliang Zhao , Yongming Rao , Zuyan Liu , Benlin Liu , Jie Zhou , Jiwen Lu

4D Visual Pre-training for Robot Learning

General visual representations learned from web-scale datasets for robotics have achieved great success in recent years, enabling data-efficient robot learning on manipulation tasks; yet these pre-trained representations are mostly on 2D…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Chengkai Hou , Yanjie Ze , Yankai Fu , Zeyu Gao , Songbo Hu , Yue Yu , Shanghang Zhang , Huazhe Xu

A Diffusion Model Translator for Efficient Image-to-Image Translation

Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoising step for an iterative…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Mengfei Xia , Yu Zhou , Ran Yi , Yong-Jin Liu , Wenping Wang

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting

Pre-trained language models (PLMs) have played an increasing role in multimedia research. In terms of vision-language (VL) tasks, they often serve as a language encoder and still require an additional fusion network for VL reasoning,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Shubin Huang , Qiong Wu , Yiyi Zhou , Weijie Chen , Rongsheng Zhang , Xiaoshuai Sun , Rongrong Ji

A Modular Conditional Diffusion Framework for Image Reconstruction

Diffusion Probabilistic Models (DPMs) have been recently utilized to deal with various blind image restoration (IR) tasks, where they have demonstrated outstanding performance in terms of perceptual quality. However, the task-specific…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Magauiya Zhussip , Iaroslav Koshelev , Stamatis Lefkimmiatis

CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation

We introduce a diffusion-based cross-domain image translator in the absence of paired training data. Unlike GAN-based methods, our approach integrates diffusion models to learn the image translation process, allowing for more coverable…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Shilong Zou , Yuhang Huang , Renjiao Yi , Chenyang Zhu , Kai Xu

EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing

Visual-prompt-guided edit transfer aims to learn image transformations directly from example pairs, offering more precise and controllable editing than purely text-driven approaches. However, existing diffusion transformer-based methods…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Lan Chen , Qi Mao , Yiren Song , Yuchao Gu , Siwei Ma

DDP: Diffusion Model for Dense Visual Prediction

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a…

Computer Vision and Pattern Recognition · Computer Science 2023-05-16 Yuanfeng Ji , Zhe Chen , Enze Xie , Lanqing Hong , Xihui Liu , Zhaoqiang Liu , Tong Lu , Zhenguo Li , Ping Luo

MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer

The Diffusion Probabilistic Model (DPM) has recently gained popularity in the field of computer vision, thanks to its image generation applications, such as Imagen, Latent Diffusion Models, and Stable Diffusion, which have demonstrated…

Image and Video Processing · Electrical Eng. & Systems 2023-12-27 Junde Wu , Wei Ji , Huazhu Fu , Min Xu , Yueming Jin , Yanwu Xu

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning

Image-to-image translation aims to learn a mapping between a source and a target domain, enabling tasks such as style transfer, appearance transformation, and domain adaptation. In this work, we explore a diffusion-based framework for…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Qiang Zhu , Kuan Lu , Menghao Huo , Yuxiao Li

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context tasks, such as high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhengcong Fei , Mingyuan Fan , Changqian Yu , Debang Li , Junshi Huang

DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework

Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Tongchun Zuo , Zaiyu Huang , Shuliang Ning , Ente Lin , Chao Liang , Zerong Zheng , Jianwen Jiang , Yuan Zhang , Mingyuan Gao , Xin Dong

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Shiqi Sun , Shancheng Fang , Qian He , Wei Liu

Textualize Visual Prompt for Image Editing via Diffusion Bridge

Visual prompt, a pair of before-and-after edited images, can convey indescribable imagery transformations and prosper in image editing. However, current visual prompt methods rely on a pretrained text-guided image-to-image generative model…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Pengcheng Xu , Qingnan Fan , Fei Kou , Shuai Qin , Hong Gu , Ruoyu Zhao , Charles Ling , Boyu Wang

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process. To solve this…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Shanghua Gao , Pan Zhou , Ming-Ming Cheng , Shuicheng Yan

Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling specialized tools, yet existing methods rely on…

Computer Vision and Pattern Recognition · Computer Science 2025-12-25 Shengguang Wu , Xiaohan Wang , Yuhui Zhang , Hao Zhu , Serena Yeung-Levy

I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a…

Computation and Language · Computer Science 2023-07-17 Tuhin Chakrabarty , Arkadiy Saakyan , Olivia Winn , Artemis Panagopoulou , Yue Yang , Marianna Apidianaki , Smaranda Muresan

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality images with rich texture and reasonable structure…

Computer Vision and Pattern Recognition · Computer Science 2024-08-16 Hefeng Wang , Jiale Cao , Jin Xie , Aiping Yang , Yanwei Pang

Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning

A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric. Despite the remarkable progress of generative AI, existing models…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Yu Xu , Yuxin Zhang , Juan Cao , Lin Gao , Chunyu Wang , Oliver Deussen , Tong-Yee Lee , Fan Tang

A Unified Conditional Framework for Diffusion-based Image Restoration

Diffusion Probabilistic Models (DPMs) have recently shown remarkable performance in image generation tasks, which are capable of generating highly realistic images. When adopting DPMs for image restoration tasks, the crucial aspect lies in…

Computer Vision and Pattern Recognition · Computer Science 2023-06-01 Yi Zhang , Xiaoyu Shi , Dasong Li , Xiaogang Wang , Jian Wang , Hongsheng Li