Related papers: Captioning Visualizations with Large Language Mode…

Learning Visual Representations with Caption Annotations

Pretraining general-purpose visual features has become a crucial part of tackling many computer vision tasks. While one can learn such features on the extensively-annotated ImageNet dataset, recent approaches have looked at ways to allow…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Mert Bulent Sariyildiz , Julien Perez , Diane Larlus

Visualization in the Era of Artificial Intelligence: Experiments for Creating Structural Visualizations by Prompting Large Language Models

Large Language Models (LLMs) have revolutionized natural language processing by generating human-like text and images from textual input. However, their potential to generate complex 2D/3D visualizations has been largely unexplored. We…

Software Engineering · Computer Science 2023-05-12 Hans-Georg Fill , Fabian Muff

Using Large Language Models to Generate Engaging Captions for Data Visualizations

Creating compelling captions for data visualizations has been a longstanding challenge. Visualization researchers are typically untrained in journalistic reporting and hence the captions that are placed below data visualizations tend to be…

Computation and Language · Computer Science 2023-01-02 Ashley Liew , Klaus Mueller

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs)…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Davide Bucciarelli , Nicholas Moratelli , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Evaluating LLMs for Visualization Tasks

Information Visualization has been utilized to gain insights from complex data. In recent times, Large Language Models (LLMs) have performed very well in many tasks. In this paper, we showcase the capabilities of different popular LLMs to…

Software Engineering · Computer Science 2025-06-16 Saadiq Rauf Khan , Vinit Chandak , Sougata Mukherjea

Improving Visual Storytelling with Multimodal Large Language Models

Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Xiaochuan Lin , Xiangyong Chen

AnimatedLLM: Explaining LLMs with Interactive Visualizations

Large language models (LLMs) are becoming central to natural language processing education, yet materials showing their mechanics are sparse. We present AnimatedLLM, an interactive web application that provides step-by-step visualizations…

Computation and Language · Computer Science 2026-02-02 Zdeněk Kasner , Ondřej Dušek

A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling

When captioning an image, people describe objects in diverse ways, such as by using different terms and/or including details that are perceptually noteworthy to them. Descriptions can be especially unique across languages and cultures.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Kyle Buettner , Jacob T. Emmerson , Adriana Kovashka

An Introduction to Vision-Language Modeling

Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models…

Machine Learning · Computer Science 2024-05-28 Florian Bordes , Richard Yuanzhe Pang , Anurag Ajay , Alexander C. Li , Adrien Bardes , Suzanne Petryk , Oscar Mañas , Zhiqiu Lin , Anas Mahmoud , Bargav Jayaraman , Mark Ibrahim , Melissa Hall , Yunyang Xiong , Jonathan Lebensold , Candace Ross , Srihari Jayakumar , Chuan Guo , Diane Bouchacourt , Haider Al-Tahan , Karthik Padthe , Vasu Sharma , Hu Xu , Xiaoqing Ellen Tan , Megan Richards , Samuel Lavoie , Pietro Astolfi , Reyhane Askari Hemmat , Jun Chen , Kushal Tirumala , Rim Assouel , Mazda Moayeri , Arjang Talattof , Kamalika Chaudhuri , Zechun Liu , Xilun Chen , Quentin Garrido , Karen Ullrich , Aishwarya Agrawal , Kate Saenko , Asli Celikyilmaz , Vikas Chandra

Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models

Large Vision-Language Models (LVLMs) integrate image encoders with Large Language Models (LLMs) to process multi-modal inputs and perform complex visual tasks. However, they often generate hallucinations by describing non-existent objects…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yaqi Sun , Kyohei Atarashi , Koh Takeuchi , Hisashi Kashima

Scenario Understanding of Traffic Scenes Through Large Visual Language Models

Deep learning models for autonomous driving, encompassing perception, planning, and control, depend on vast datasets to achieve their high performance. However, their generalization often suffers due to domain-specific data distributions,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Esteban Rivera , Jannik Lübberstedt , Nico Uhlemann , Markus Lienkamp

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Large language models (LLMs) have made significant advancements in natural language understanding. However, through that enormous semantic representation that the LLM has learnt, is it somehow possible for it to understand images as well?…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Mu Cai , Zeyi Huang , Yuheng Li , Utkarsh Ojha , Haohan Wang , Yong Jae Lee

Large Language Models Meet Computer Vision: A Brief Survey

Recently, the intersection of Large Language Models (LLMs) and Computer Vision (CV) has emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI). As transformers have become the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Raby Hamadi

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

Humans tend to decompose a sentence into different parts like \textsc{sth do sth at someplace} and then fill each part with certain content. Inspired by this, we follow the \textit{principle of modular design} to propose a novel image…

Computer Vision and Pattern Recognition · Computer Science 2023-04-25 Xu Yang , Hanwang Zhang , Chongyang Gao , Jianfei Cai

Visual Large Language Models for Generalized and Specialized Applications

Visual-language models (VLM) have emerged as a powerful tool for learning a unified embedding space for vision and language. Inspired by large language models, which have demonstrated strong reasoning and multi-task capabilities, visual…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Yifan Li , Zhixin Lai , Wentao Bao , Zhen Tan , Anh Dao , Kewei Sui , Jiayi Shen , Dong Liu , Huan Liu , Yu Kong

Video Summarization with Large Language Models

The exponential increase in video content poses significant challenges in terms of efficient navigation, search, and retrieval, thus requiring advanced video summarization techniques. Existing video summarization methods, which heavily rely…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Min Jung Lee , Dayoung Gong , Minsu Cho

How Can Objects Help Video-Language Understanding?

Do we still need to represent objects explicitly in multimodal large language models (MLLMs)? To one extreme, pre-trained encoders convert images into visual tokens, with which objects and spatiotemporal relationships may be implicitly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Zitian Tang , Shijie Wang , Junho Cho , Jaewook Yoo , Chen Sun

BLUEX Revisited: Enhancing Benchmark Coverage with Automatic Captioning

With the growing capabilities of Large Language Models (LLMs), there is an increasing need for robust evaluation methods, especially in multilingual and non-English contexts. We present an updated version of the BLUEX dataset, now including…

Computation and Language · Computer Science 2025-09-01 João Guilherme Alves Santos , Giovana Kerche Bonás , Thales Sales Almeida

Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions

Image captioning models tend to describe images in an object-centric way, emphasising visible objects. But image descriptions can also abstract away from objects and describe the type of scene depicted. In this paper, we explore the…

Computation and Language · Computer Science 2022-11-11 Michele Cafagna , Kees van Deemter , Albert Gatt

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Akash Ghosh , Arkadeep Acharya , Sriparna Saha , Vinija Jain , Aman Chadha