English
Related papers

Related papers: Improving Automatic VQA Evaluation Using Large Lan…

200 papers

The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-ended tasks. Conventional metrics such as EM and F1,…

Computation and Language · Computer Science 2025-11-12 Sher Badshah , Hassan Sajjad

Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Sirui Cheng , Siyu Zhang , Jiayi Wu , Muchen Lan

\Ac{LFQA} aims to generate lengthy answers to complex questions. This scenario presents great flexibility as well as significant challenges for evaluation. Most evaluations rely on deterministic metrics that depend on string or n-gram…

Information Retrieval · Computer Science 2025-04-28 Ning Xian , Yixing Fan , Ruqing Zhang , Maarten de Rijke , Jiafeng Guo

This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that…

Computation and Language · Computer Science 2023-10-24 Cunxiang Wang , Sirui Cheng , Qipeng Guo , Yuanhao Yue , Bowen Ding , Zhikun Xu , Yidong Wang , Xiangkun Hu , Zheng Zhang , Yue Zhang

Recent advancements in autonomous driving (AD) have explored the use of vision-language models (VLMs) within visual question answering (VQA) frameworks for direct driving decision-making. However, these approaches often depend on…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Xin Hu , Taotao Jing , Renran Tian , Zhengming Ding

Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions,…

Computer Vision and Pattern Recognition · Computer Science 2023-11-16 Yunshi Lan , Xiang Li , Xin Liu , Yang Li , Wei Qin , Weining Qian

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly…

To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on assessing single-turn responses to given questions. However, this approach doesn't capture the dynamic nature of human-AI…

Computation and Language · Computer Science 2024-11-19 Ruosen Li , Ruochen Li , Barry Wang , Xinya Du

This paper explores the potential of using Large Language Models (LLMs) to automate the evaluation of responses in medical Question and Answer (Q\&A) systems, a crucial form of Natural Language Processing. Traditionally, human evaluation…

Computation and Language · Computer Science 2024-09-04 Jack Krolik , Herprit Mahal , Feroz Ahmad , Gaurav Trivedi , Bahador Saket

Large Vision-Language Models (LVLMs) have achieved strong performance on vision-language tasks, particularly Visual Question Answering (VQA). While prior work has explored unimodal biases in VQA, the problem of selection bias in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Md. Atabuzzaman , Ali Asgarov , Chris Thomas

Visual Question Answering (VQA) is an evolving research field aimed at enabling machines to answer questions about visual content by integrating image and language processing techniques such as feature extraction, object detection, text…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Ngoc Dung Huynh , Mohamed Reda Bouadjenek , Sunil Aryal , Imran Razzak , Hakim Hacid

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in…

Computation and Language · Computer Science 2025-08-08 Zhongjian Hu , Peng Yang , Bing Li , Zhenqi Wang

Large Language Models (LLMs) demonstrate impressive reasoning ability and the maintenance of world knowledge not only in natural language tasks, but also in some vision-language tasks such as open-domain knowledge-based visual question…

Computation and Language · Computer Science 2024-06-11 Ziyue Wang , Chi Chen , Peng Li , Yang Liu

Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is…

Computation and Language · Computer Science 2024-11-27 Jiayi Kuang , Jingyou Xie , Haohao Luo , Ronghao Li , Zhe Xu , Xianfeng Cheng , Yinghui Li , Xika Lin , Ying Shen

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with…

Computation and Language · Computer Science 2024-07-02 Zongxia Li , Ishani Mondal , Yijun Liang , Huy Nghiem , Jordan Boyd-Graber

Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary…

Image and Video Processing · Electrical Eng. & Systems 2024-10-23 Xiaolan Chen , Ruoyu Chen , Pusheng Xu , Weiyi Zhang , Xianwen Shang , Mingguang He , Danli Shi

Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM…

Having revolutionized natural language processing (NLP) applications, large language models (LLMs) are expanding into the realm of multimodal inputs. Owing to their ability to interpret images, multimodal LLMs (MLLMs) have been primarily…

Computer Vision and Pattern Recognition · Computer Science 2024-02-14 Jusung Lee , Sungguk Cha , Younghyun Lee , Cheoljong Yang

Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique,…

Computation and Language · Computer Science 2024-01-23 Chen Zhang , Luis Fernando D'Haro , Yiming Chen , Malu Zhang , Haizhou Li

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Simon Ging , María A. Bravo , Thomas Brox
‹ Prev 1 2 3 10 Next ›