Related papers: Image Captioning via Dynamic Path Customization

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement learning have been utilized to train deep end-to-end systems on natural language processing tasks. What's more, with the complexity of understanding image content and…

Computer Vision and Pattern Recognition · Computer Science 2018-09-14 Haichao Shi , Peng Li , Bo Wang , Zhenyu Wang

An Efficient Technique for Image Captioning using Deep Neural Network

With the huge expansion of internet and trillions of gigabytes of data generated every single day, the needs for the development of various tools has become mandatory in order to maintain system adaptability to rapid changes. One of these…

Computer Vision and Pattern Recognition · Computer Science 2020-09-08 Borneel Bikash Phukan , Amiya Ranjan Panda

Automated Image Captioning with CNNs and Transformers

This project aims to create an automated image captioning system that generates natural language descriptions for input images by integrating techniques from computer vision and natural language processing. We employ various different…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Joshua Adrian Cahyono , Jeremy Nathan Jusuf

Dual-Stream Collaborative Transformer for Image Captioning

Current region feature-based image captioning methods have progressed rapidly and achieved remarkable performance. However, they are still prone to generating irrelevant descriptions due to the lack of contextual information and the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Jun Wan , Jun Liu , Zhihui lai , Jie Zhou

Show, Edit and Tell: A Framework for Editing Image Captions

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when…

Computer Vision and Pattern Recognition · Computer Science 2020-03-09 Fawaz Sammani , Luke Melas-Kyriazi

Learning Dynamic Routing for Semantic Segmentation

Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab…

Computer Vision and Pattern Recognition · Computer Science 2020-03-24 Yanwei Li , Lin Song , Yukang Chen , Zeming Li , Xiangyu Zhang , Xingang Wang , Jian Sun

CoDiNet: Path Distribution Modeling with Consistency and Diversity for Dynamic Routing

Dynamic routing networks, aimed at finding the best routing paths in the networks, have achieved significant improvements to neural networks in terms of accuracy and efficiency. In this paper, we see dynamic routing networks in a fresh…

Computer Vision and Pattern Recognition · Computer Science 2021-05-27 Huanyu Wang , Zequn Qin , Songyuan Li , Xi Li

Semantic-Conditional Diffusion Networks for Image Captioning

Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Jianjie Luo , Yehao Li , Yingwei Pan , Ting Yao , Jianlin Feng , Hongyang Chao , Tao Mei

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance…

Computer Vision and Pattern Recognition · Computer Science 2017-04-14 Zhou Ren , Xiaoyu Wang , Ning Zhang , Xutao Lv , Li-Jia Li

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images, is one of the core problems in Computer Vision and has seen considerable progress using Deep Learning Techniques. We propose to use Inception-ResNet Convolutional…

Computer Vision and Pattern Recognition · Computer Science 2021-02-23 Sulabh Katiyar , Samir Kumar Borgohain

Image Captioning through Image Transformer

Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect in captioning is the notion of attention: How to decide what to describe and in which order. Inspired by the…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Sen He , Wentong Liao , Hamed R. Tavakoli , Michael Yang , Bodo Rosenhahn , Nicolas Pugeault

DiT: Efficient Vision Transformers with Dynamic Token Routing

Recently, the tokens of images share the same static data flow in many dense networks. However, challenges arise from the variance among the objects in images, such as large variations in the spatial scale and difficulties of recognition…

Computer Vision and Pattern Recognition · Computer Science 2023-08-14 Yuchen Ma , Zhengcong Fei , Junshi Huang

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a…

Computation and Language · Computer Science 2015-08-11 Jack Hessel , Nicolas Savva , Michael J. Wilber

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model

Automatic image captioning, a multifaceted task bridging computer vision and natural language processing, aims to generate descriptive textual content from visual input. While Convolutional Neural Networks (CNNs) and Long Short-Term Memory…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Amanuel Tafese Dufera

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

The prosperity of deep learning contributes to the rapid progress in scene text detection. Among all the methods with convolutional networks, segmentation-based ones have drawn extensive attention due to their superiority in detecting text…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Jingyu Lin , Jie Jiang , Yan Yan , Chunchao Guo , Hongfa Wang , Wei Liu , Hanzi Wang

Dual-Level Collaborative Transformer for Image Captioning

Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning. However, they are still criticized for the lack of contextual information and fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2021-08-04 Yunpeng Luo , Jiayi Ji , Xiaoshuai Sun , Liujuan Cao , Yongjian Wu , Feiyue Huang , Chia-Wen Lin , Rongrong Ji

Dynamic Slimmable Denoising Network

Recently, tremendous human-designed and automatically searched neural networks have been applied to image denoising. However, previous works intend to handle all noisy images in a pre-defined static network architecture, which inevitably…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zutao Jiang , Changlin Li , Xiaojun Chang , Jihua Zhu , Yi Yang

Towards Local Visual Modeling for Image Captioning

In this paper, we study the local visual modeling with grid features for image captioning, which is critical for generating accurate and detailed captions. To achieve this target, we propose a Locality-Sensitive Transformer Network (LSTNet)…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Yiwei Ma , Jiayi Ji , Xiaoshuai Sun , Yiyi Zhou , Rongrong Ji

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability…

Computer Vision and Pattern Recognition · Computer Science 2022-07-07 Yifang Men , Yuan Yao , Miaomiao Cui , Zhouhui Lian , Xuansong Xie

Boosting Video Captioning with Dynamic Loss Network

Video captioning is one of the challenging problems at the intersection of vision and language, having many real-life applications in video retrieval, video surveillance, assisting visually challenged people, Human-machine interface, and…

Computer Vision and Pattern Recognition · Computer Science 2022-02-03 Nasib Ullah , Partha Pratim Mohanta