English
Related papers

Related papers: Captioning Visualizations with Large Language Mode…

200 papers

Pretraining general-purpose visual features has become a crucial part of tackling many computer vision tasks. While one can learn such features on the extensively-annotated ImageNet dataset, recent approaches have looked at ways to allow…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Mert Bulent Sariyildiz , Julien Perez , Diane Larlus

Large Language Models (LLMs) have revolutionized natural language processing by generating human-like text and images from textual input. However, their potential to generate complex 2D/3D visualizations has been largely unexplored. We…

Software Engineering · Computer Science 2023-05-12 Hans-Georg Fill , Fabian Muff

Creating compelling captions for data visualizations has been a longstanding challenge. Visualization researchers are typically untrained in journalistic reporting and hence the captions that are placed below data visualizations tend to be…

Computation and Language · Computer Science 2023-01-02 Ashley Liew , Klaus Mueller

The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs)…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Davide Bucciarelli , Nicholas Moratelli , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Information Visualization has been utilized to gain insights from complex data. In recent times, Large Language Models (LLMs) have performed very well in many tasks. In this paper, we showcase the capabilities of different popular LLMs to…

Software Engineering · Computer Science 2025-06-16 Saadiq Rauf Khan , Vinit Chandak , Sougata Mukherjea

Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Xiaochuan Lin , Xiangyong Chen

Large language models (LLMs) are becoming central to natural language processing education, yet materials showing their mechanics are sparse. We present AnimatedLLM, an interactive web application that provides step-by-step visualizations…

Computation and Language · Computer Science 2026-02-02 Zdeněk Kasner , Ondřej Dušek

When captioning an image, people describe objects in diverse ways, such as by using different terms and/or including details that are perceptually noteworthy to them. Descriptions can be especially unique across languages and cultures.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Kyle Buettner , Jacob T. Emmerson , Adriana Kovashka

Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models…

Large Vision-Language Models (LVLMs) integrate image encoders with Large Language Models (LLMs) to process multi-modal inputs and perform complex visual tasks. However, they often generate hallucinations by describing non-existent objects…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yaqi Sun , Kyohei Atarashi , Koh Takeuchi , Hisashi Kashima

Deep learning models for autonomous driving, encompassing perception, planning, and control, depend on vast datasets to achieve their high performance. However, their generalization often suffers due to domain-specific data distributions,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Esteban Rivera , Jannik Lübberstedt , Nico Uhlemann , Markus Lienkamp

Large language models (LLMs) have made significant advancements in natural language understanding. However, through that enormous semantic representation that the LLM has learnt, is it somehow possible for it to understand images as well?…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Mu Cai , Zeyi Huang , Yuheng Li , Utkarsh Ojha , Haohan Wang , Yong Jae Lee

Recently, the intersection of Large Language Models (LLMs) and Computer Vision (CV) has emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI). As transformers have become the…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Raby Hamadi

Humans tend to decompose a sentence into different parts like \textsc{sth do sth at someplace} and then fill each part with certain content. Inspired by this, we follow the \textit{principle of modular design} to propose a novel image…

Computer Vision and Pattern Recognition · Computer Science 2023-04-25 Xu Yang , Hanwang Zhang , Chongyang Gao , Jianfei Cai

Visual-language models (VLM) have emerged as a powerful tool for learning a unified embedding space for vision and language. Inspired by large language models, which have demonstrated strong reasoning and multi-task capabilities, visual…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Yifan Li , Zhixin Lai , Wentao Bao , Zhen Tan , Anh Dao , Kewei Sui , Jiayi Shen , Dong Liu , Huan Liu , Yu Kong

The exponential increase in video content poses significant challenges in terms of efficient navigation, search, and retrieval, thus requiring advanced video summarization techniques. Existing video summarization methods, which heavily rely…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Min Jung Lee , Dayoung Gong , Minsu Cho

Do we still need to represent objects explicitly in multimodal large language models (MLLMs)? To one extreme, pre-trained encoders convert images into visual tokens, with which objects and spatiotemporal relationships may be implicitly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Zitian Tang , Shijie Wang , Junho Cho , Jaewook Yoo , Chen Sun

With the growing capabilities of Large Language Models (LLMs), there is an increasing need for robust evaluation methods, especially in multilingual and non-English contexts. We present an updated version of the BLUEX dataset, now including…

Computation and Language · Computer Science 2025-09-01 João Guilherme Alves Santos , Giovana Kerche Bonás , Thales Sales Almeida

Image captioning models tend to describe images in an object-centric way, emphasising visible objects. But image descriptions can also abstract away from objects and describe the type of scene depicted. In this paper, we explore the…

Computation and Language · Computer Science 2022-11-11 Michele Cafagna , Kees van Deemter , Albert Gatt

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Akash Ghosh , Arkadeep Acharya , Sriparna Saha , Vinija Jain , Aman Chadha
‹ Prev 1 2 3 10 Next ›