Related papers: Improving Automatic VQA Evaluation Using Large Lan…

Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form QA

The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-ended tasks. Conventional metrics such as EM and F1,…

Computation and Language · Computer Science 2025-11-12 Sher Badshah , Hassan Sajjad

KNVQA: A Benchmark for evaluation knowledge-based VQA

Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Sirui Cheng , Siyu Zhang , Jiayi Wu , Muchen Lan

An Empirical Study of Evaluating Long-form Question Answering

\Ac{LFQA} aims to generate lengthy answers to complex questions. This scenario presents great flexibility as well as significant challenges for evaluation. Most evaluations rely on deterministic metrics that depend on string or n-gram…

Information Retrieval · Computer Science 2025-04-28 Ning Xian , Yixing Fan , Ruqing Zhang , Maarten de Rijke , Jiafeng Guo

Evaluating Open-QA Evaluation

This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that…

Computation and Language · Computer Science 2023-10-24 Cunxiang Wang , Sirui Cheng , Qipeng Guo , Yuanhao Yue , Bowen Ding , Zhikun Xu , Yidong Wang , Xiangkun Hu , Zheng Zhang , Yue Zhang

VLMs Guided Interpretable Decision Making for Autonomous Driving

Recent advancements in autonomous driving (AD) have explored the use of vision-language models (VLMs) within visual question answering (VQA) frameworks for direct driving decision-making. However, these approaches often depend on…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Xin Hu , Taotao Jing , Renran Tian , Zhengming Ding

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts

Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Yunshi Lan , Xiang Li , Xin Liu , Yang Li , Wei Qin , Weining Qian

Aligning Human and LLM Judgments: Insights from EvalAssist on Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly…

Human-Computer Interaction · Computer Science 2025-08-07 Zahra Ashktorab , Michael Desmond , Qian Pan , James M. Johnson , Martin Santillan Cooper , Elizabeth M. Daly , Rahul Nair , Tejaswini Pedapati , Hyo Jin Do , Werner Geyer

IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering

To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on assessing single-turn responses to given questions. However, this approach doesn't capture the dynamic nature of human-AI…

Computation and Language · Computer Science 2024-11-19 Ruosen Li , Ruochen Li , Barry Wang , Xinya Du

Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation

This paper explores the potential of using Large Language Models (LLMs) to automate the evaluation of responses in medical Question and Answer (Q\&A) systems, a crucial form of Natural Language Processing. Traditionally, human evaluation…

Computation and Language · Computer Science 2024-09-04 Jack Krolik , Herprit Mahal , Feroz Ahmad , Gaurav Trivedi , Bahador Saket

Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

Large Vision-Language Models (LVLMs) have achieved strong performance on vision-language tasks, particularly Visual Question Answering (VQA). While prior work has explored unimodal biases in VQA, the problem of selection bias in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Md. Atabuzzaman , Ali Asgarov , Chris Thomas

Visual question answering: from early developments to recent advances -- a survey

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in…

Computation and Language · Computer Science 2025-08-08 Zhongjian Hu , Peng Yang , Bing Li , Zhenqi Wang

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

Large Language Models (LLMs) demonstrate impressive reasoning ability and the maintenance of world knowledge not only in natural language tasks, but also in some vision-language tasks such as open-domain knowledge-based visual question…

Computation and Language · Computer Science 2024-06-11 Ziyue Wang , Chi Chen , Peng Li , Yang Liu

Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey

Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is…

Computation and Language · Computer Science 2024-11-27 Jiayi Kuang , Jingyou Xie , Haohao Luo , Ronghao Li , Zhe Xu , Xianfeng Cheng , Yinghui Li , Xika Lin , Ying Shen

CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with…

Computation and Language · Computer Science 2024-07-02 Zongxia Li , Ishani Mondal , Yijun Liang , Huy Nghiem , Jordan Boyd-Graber

Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective

Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary…

Image and Video Processing · Electrical Eng. & Systems 2024-10-23 Xiaolan Chen , Ruoyu Chen , Pusheng Xu , Weiyi Zhang , Xianwen Shang , Mingguang He , Danli Shi

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM…

Computation and Language · Computer Science 2023-02-14 Bernd Bohnet , Vinh Q. Tran , Pat Verga , Roee Aharoni , Daniel Andor , Livio Baldini Soares , Massimiliano Ciaramita , Jacob Eisenstein , Kuzman Ganchev , Jonathan Herzig , Kai Hui , Tom Kwiatkowski , Ji Ma , Jianmo Ni , Lierni Sestorain Saralegui , Tal Schuster , William W. Cohen , Michael Collins , Dipanjan Das , Donald Metzler , Slav Petrov , Kellie Webster

Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks

Having revolutionized natural language processing (NLP) applications, large language models (LLMs) are expanding into the realm of multimodal inputs. Owing to their ability to interpret images, multimodal LLMs (MLLMs) have been primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-02-14 Jusung Lee , Sungguk Cha , Younghyun Lee , Cheoljong Yang

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique,…

Computation and Language · Computer Science 2024-01-23 Chen Zhang , Luis Fernando D'Haro , Yiming Chen , Malu Zhang , Haizhou Li

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Simon Ging , María A. Bravo , Thomas Brox