Related papers: Multimodal Table Understanding

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges

Tables have gained significant attention in large language models (LLMs) and multimodal large language models (MLLMs) due to their complex and flexible structure. Unlike linear text inputs, tables are two-dimensional, encompassing formats…

Computation and Language · Computer Science 2025-08-04 Xiaofeng Wu , Alan Ritter , Wei Xu

Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations pose unique challenges for machine understanding,…

Artificial Intelligence · Computer Science 2026-02-10 Zhuoyan Xu , Haoyang Fang , Boran Han , Bonan Min , Bernie Wang , Cuixiong Hu , Shuai Zhang

Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning

Recent large language models (LLMs) have advanced table understanding capabilities but rely on converting tables into text sequences. While multimodal large language models (MLLMs) enable direct visual processing, they face limitations in…

Computation and Language · Computer Science 2025-02-26 Bohao Yang , Yingji Zhang , Dong Liu , André Freitas , Chenghua Lin

Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports

With recent advancements in Large Language Models (LLMs) and growing interest in retrieval-augmented generation (RAG), the ability to understand table structures has become increasingly important. This is especially critical in financial…

Computation and Language · Computer Science 2025-05-26 Hayato Aida , Kosuke Takahashi , Takahiro Omi

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Yoonsik Kim , Moonbin Yim , Ka Yeon Song

TableLlama: Towards Open Large Generalist Models for Tables

Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are…

Computation and Language · Computer Science 2024-04-08 Tianshu Zhang , Xiang Yue , Yifei Li , Huan Sun

Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis

Large language models (LLMs) have demonstrated immense capabilities in understanding textual data and are increasingly being adopted to help researchers accelerate scientific discovery through knowledge extraction (information retrieval),…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Robinson Umeike , Neil Getty , Fangfang Xia , Rick Stevens

Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data

Tables are among the most widely used tools for representing structured data in research, business, medicine, and education. Although LLMs demonstrate strong performance in downstream tasks, their efficiency in processing tabular data…

Computation and Language · Computer Science 2025-08-27 Ekaterina Borisova , Fabio Barth , Nils Feldhus , Raia Abu Ahmad , Malte Ostendorff , Pedro Ortiz Suarez , Georg Rehm , Sebastian Möller

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, the understanding of their capability to process structured data like tables remains an under-explored area.…

Computation and Language · Computer Science 2024-07-18 Yuan Sui , Mengyu Zhou , Mingjie Zhou , Shi Han , Dongmei Zhang

Large Language Model for Table Processing: A Survey

Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet manipulations, web table question answering, and image table information extraction.…

Artificial Intelligence · Computer Science 2024-11-05 Weizheng Lu , Jing Zhang , Ju Fan , Zihao Fu , Yueguo Chen , Xiaoyong Du

MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in interpreting visual layouts and text. However, a significant challenge remains in their ability to interpret robustly and reason over multi-tabular data presented as…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Anshul Singh , Chris Biemann , Jan Strich

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Zhipeng Huang , Zhizheng Zhang , Zheng-Jun Zha , Yan Lu , Baining Guo

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding

This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained…

Computation and Language · Computer Science 2024-03-22 Masato Fujitake

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing,…

Computation and Language · Computer Science 2024-12-04 Yunkai Dang , Kaichen Huang , Jiahao Huo , Yibo Yan , Sirui Huang , Dongrui Liu , Mengxi Gao , Jie Zhang , Chen Qian , Kun Wang , Yong Liu , Jing Shao , Hui Xiong , Xuming Hu

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

Chart understanding is a quintessential information fusion task, requiring the seamless integration of graphical and textual data to extract meaning. The advent of Multimodal Large Language Models (MLLMs) has revolutionized this domain, yet…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Zhihang Yi , Jian Zhao , Jiancheng Lv , Tao Wang

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding

Multimodal tables i.e. tabular layouts interleaved with charts, maps, icons, and color encodings are ubiquitous in real applications yet remain difficult for Multimodal Large Language Models (MLLMs). Despite advances in text and image…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Prasham Titiya , Jainil Trivedi , Chitta Baral , Vivek Gupta

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

TableMaster: A Recipe to Advance Table Understanding with Language Models

Tables serve as a fundamental format for representing structured relational data. While current language models (LMs) excel at many text-based tasks, they still face challenges in table understanding due to the complex characteristics of…

Computation and Language · Computer Science 2026-04-16 Lang Cao , Hanbing Liu

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Multi-modal large language models (MLLMs) have rapidly advanced in visual tasks, yet their spatial understanding remains limited to single images, leaving them ill-suited for physical-world applications that require multi-frame reasoning.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Runsen Xu , Weiyao Wang , Hao Tang , Xingyu Chen , Xiaodong Wang , Fu-Jen Chu , Matt Feiszli , Kevin J. Liang

A Survey of Table Reasoning with Large Language Models

Table reasoning, which aims to generate the corresponding answer to the question following the user requirement according to the provided table, and optionally a text description of the table, effectively improving the efficiency of…

Computation and Language · Computer Science 2024-02-14 Xuanliang Zhang , Dingzirui Wang , Longxu Dou , Qingfu Zhu , Wanxiang Che