Related papers: Semantic-Conditional Diffusion Networks for Image …

DiffEdit: Diffusion-based semantic image editing with mask guidance

Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Guillaume Couairon , Jakob Verbeek , Holger Schwenk , Matthieu Cord

Comprehending and Ordering Semantics for Image Captioning

Comprehending the rich semantics in an image and ordering them in linguistic order are essential to compose a visually-grounded and linguistically coherent description for image captioning. Modern techniques commonly capitalize on a…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Yehao Li , Yingwei Pan , Ting Yao , Tao Mei

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Guisheng Liu , Yi Li , Zhengcong Fei , Haiyan Fu , Xiangyang Luo , Yanqing Guo

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

Recent advances in denoising diffusion probabilistic models have shown great success in image synthesis tasks. While there are already works exploring the potential of this powerful tool in image semantic segmentation, its application in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-19 Xinrong Hu , Yu-Jen Chen , Tsung-Yi Ho , Yiyu Shi

Multimodal Data Augmentation for Image Captioning using Diffusion Models

Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Changrong Xiao , Sean Xin Xu , Kunpeng Zhang

Conditional Image Synthesis with Diffusion Models: A Survey

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Zheyuan Zhan , Defang Chen , Jian-Ping Mei , Zhenghe Zhao , Jiawei Chen , Chun Chen , Siwei Lyu , Can Wang

SC-CDM: Enhancing Quality of Image Semantic Communication with a Compact Diffusion Model

Semantic Communication (SC) is an emerging technology that has attracted much attention in the sixth-generation (6G) mobile communication systems. However, few literature has fully considered the perceptual quality of the reconstructed…

Image and Video Processing · Electrical Eng. & Systems 2024-10-04 Kexin Zhang , Lixin Li , Wensheng Lin , Yuna Yan , Wenchi Cheng , Zhu Han

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning

Controllable image semantic understanding tasks, such as captioning or segmentation, necessitate users to input a prompt (e.g., text or bounding boxes) to predict a unique outcome, presenting challenges such as high-cost prompt input or…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Xu Zhang , Jin Yuan , Hanwang Zhang , Guojin Zhong , Yongsheng Zang , Jiacheng Lin , Zhiyong Li

Improving Image Captioning via Predicting Structured Concepts

Having the difficulty of solving the semantic gap between images and texts for the image captioning task, conventional studies in this area paid some attention to treating semantic concepts as a bridge between the two modalities and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ting Wang , Weidong Chen , Yuanhe Tian , Yan Song , Zhendong Mao

Bypass Network for Semantics Driven Image Paragraph Captioning

Image paragraph captioning aims to describe a given image with a sequence of coherent sentences. Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Qi Zheng , Chaoyue Wang , Dadong Wang

SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation

Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Yang Zhang , Rui Zhang , Xuecheng Nie , Haochen Li , Jikun Chen , Yifan Hao , Xin Zhang , Luoqi Liu , Ling Li

Animate Your Motion: Turning Still Images into Dynamic Videos

In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Mingxiao Li , Bo Wan , Marie-Francine Moens , Tinne Tuytelaars

Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images

Remote sensing image change captioning (RSICC) aims at generating human-like language to describe the semantic changes between bi-temporal remote sensing image pairs. It provides valuable insights into environmental dynamics and land…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Xiaofei Yu , Yitong Li , Jie Ma

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

An Empirical Study of Language CNN for Image Captioning

Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive…

Computer Vision and Pattern Recognition · Computer Science 2017-08-03 Jiuxiang Gu , Gang Wang , Jianfei Cai , Tsuhan Chen

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhilong Zhang , Zhaochen Yu , Jingwei Liu , Minkai Xu , Stefano Ermon , Bin Cui

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The emergence of generative AI has catalyzed generative…

Signal Processing · Electrical Eng. & Systems 2026-05-08 Hai-Long Qin , Jincheng Dai , Guo Lu , Shuo Shao , Sixian Wang , Tongda Xu , Wenjun Zhang , Ping Zhang , Khaled B. Letaief

MAT: A Multimodal Attentive Translator for Image Captioning

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different…

Computer Vision and Pattern Recognition · Computer Science 2017-08-11 Chang Liu , Fuchun Sun , Changhu Wang , Feng Wang , Alan Yuille

DiffCap: Exploring Continuous Diffusion on Image Captioning

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou