English
Related papers

Related papers: MSG-Chart: Multimodal Scene Graph for ChartQA

200 papers

Chart question answering (ChartQA) is challenged by the heterogeneous composition of chart elements and the subtle data patterns they encode. This work introduces a novel joint multimodal scene graph framework that explicitly models the…

Computation and Language · Computer Science 2025-04-08 Yue Dai , Soyeon Caren Han , Wei Liu

In the fields of computer vision and natural language processing, multimodal chart question-answering, especially involving color, structure, and textless charts, poses significant challenges. Traditional methods, which typically involve…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Jingxuan Wei , Nan Xu , Guiyong Chang , Yin Luo , BiHui Yu , Ruifeng Guo

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Vinay Damodaran , Sharanya Chakravarthy , Akshay Kumar , Anjana Umapathy , Teruko Mitamura , Yuta Nakashima , Noa Garcia , Chenhui Chu

Chart comprehension presents significant challenges for machine learning models due to the diverse and intricate shapes of charts. Existing multimodal methods often overlook these visual features or fail to integrate them effectively for…

Computation and Language · Computer Science 2024-08-01 Hanwen Zheng , Sijia Wang , Chris Thomas , Lifu Huang

Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Cheng Zhang , Wei-Lun Chao , Dong Xuan

The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing…

Computer Vision and Pattern Recognition · Computer Science 2023-10-04 Bruno Souza , Marius Aasan , Helio Pedrini , Adín Ramírez Rivera

3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks…

Computer Vision and Pattern Recognition · Computer Science 2024-07-25 Penglei Sun , Yaoxian Song , Xiang Liu , Xiaofei Yang , Qiang Wang , Tiefeng Li , Yang Yang , Xiaowen Chu

Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question…

Computer Vision and Pattern Recognition · Computer Science 2025-10-30 Tianchi Xie , Minzhi Lin , Mengchen Liu , Yilin Ye , Changjian Chen , Shixia Liu

This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it. The proposed model is based on…

Computer Vision and Pattern Recognition · Computer Science 2020-06-26 Lluís Gómez , Ali Furkan Biten , Rubèn Tito , Andrés Mafla , Marçal Rusiñol , Ernest Valveny , Dimosthenis Karatzas

Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the…

Computation and Language · Computer Science 2025-02-11 Zifeng Zhu , Mengzhao Jia , Zhihan Zhang , Lang Li , Meng Jiang

Scientific Literature charts often contain complex visual elements, including multi-plot figures, flowcharts, structural diagrams and etc. Evaluating multimodal models using these authentic and intricate charts provides a more accurate…

Computation and Language · Computer Science 2024-12-18 Lingdong Shen , Qigqi , Kun Ding , Gaofeng Meng , Shiming Xiang

Visual question answering (VQA) requires systems to perform concept-level reasoning by unifying unstructured (e.g., the context in question and answer; "QA context") and structured (e.g., knowledge graph for the QA context and scene;…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Yanan Wang , Michihiro Yasunaga , Hongyu Ren , Shinya Wada , Jure Leskovec

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture…

Multimedia · Computer Science 2019-08-22 Zhuoqian Yang , Zengchang Qin , Jing Yu , Yue Hu

Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are…

Computer Vision and Pattern Recognition · Computer Science 2018-11-02 Will Norcliffe-Brown , Efstathios Vafeias , Sarah Parisot

Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively…

Computation and Language · Computer Science 2026-04-24 Azher Ahmed Efat , Seok Hwan Song , Wallapak Tavanapong

Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods…

Artificial Intelligence · Computer Science 2025-11-25 Xixi Wang , Miguel Costa , Jordanka Kovaceva , Shuai Wang , Francisco C. Pereira

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Hao Li , Xu Li , Belhal Karimi , Jie Chen , Mingming Sun

Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and…

Computer Vision and Pattern Recognition · Computer Science 2020-04-01 Difei Gao , Ke Li , Ruiping Wang , Shiguang Shan , Xilin Chen

Graph machine learning has made significant strides in recent years, yet the integration of visual information with graph structure and its potential for improving performance in downstream tasks remains an underexplored area. To address…

Machine Learning · Computer Science 2025-04-01 Jing Zhu , Yuhang Zhou , Shengyi Qian , Zhongmou He , Tong Zhao , Neil Shah , Danai Koutra

Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, structured learning approaches such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Xuehai He , Xin Eric Wang
‹ Prev 1 2 3 10 Next ›