English
Related papers

Related papers: Using Vision + Language Models to Predict Item Dif…

200 papers

Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using only the item content. Large Language…

Computers and Society · Computer Science 2026-03-10 Pooya Razavi , Sonya Powers

In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Yilang Peng , Sijia Qian , Yingdan Lu , Cuihua Shen

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow,…

Standardized math assessments require expensive human pilot studies to establish the difficulty of test items. We investigate the predictive value of open-source large language models (LLMs) for evaluating the difficulty of multiple-choice…

Computation and Language · Computer Science 2026-04-22 Christabel Acquaye , Yi Ting Huang , Marine Carpuat , Rachel Rudinger

Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs.…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Abdelrahman Abdelhamed , Mahmoud Afifi , Alec Go

Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks. However, their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains…

Information Retrieval · Computer Science 2025-01-22 Chao Zhang , Haoxin Zhang , Shiwei Wu , Di Wu , Tong Xu , Xiangyu Zhao , Yan Gao , Yao Hu , Enhong Chen

Reading comprehension is a key for individual success, yet the assessment of question difficulty remains challenging due to the extensive human annotation and large-scale testing required by traditional methods such as linguistic analysis…

Computation and Language · Computer Science 2025-02-26 Yoshee Jain , John Hollander , Amber He , Sunny Tang , Liang Zhang , John Sabatini

Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine…

Machine Learning · Computer Science 2025-11-11 An Vuong , Minh-Hao Van , Prateek Verma , Chen Zhao , Xintao Wu

As educational systems evolve, ensuring that assessment items remain aligned with content standards is essential for maintaining fairness and instructional relevance. Traditional human alignment reviews are accurate but slow and…

Artificial Intelligence · Computer Science 2025-11-26 Farzan Karimi-Malekabadi , Pooya Razavi , Sonya Powers

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

Multimodal large language models (MLLMs) are changing how Blind and Low Vision (BLV) people access visual information. Unlike traditional visual interpretation tools that only provide descriptions, MLLM-enabled applications offer…

Human-Computer Interaction · Computer Science 2026-02-20 Ricardo E. Gonzalez Penuela , Crescentia Jung , Sharon Y Lin , Ruiying Hu , Shiri Azenkot

Achieving deep alignment between vision and language remains a central challenge for Multimodal Large Language Models (MLLMs). These models often fail to fully leverage visual input, defaulting to strong language priors. Our approach first…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Aarti Ghatkesar , Ganesh Venkatesh

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and…

Computation and Language · Computer Science 2025-09-30 Sydney Peters , Nan Zhang , Hong Jiao , Ming Li , Tianyi Zhou , Robert Lissitz

Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Xiaochuan Lin , Xiangyong Chen

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Tobias Groot , Matias Valdenegro-Toro

This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision, focusing on model steerability, composability, and the application of long-term…

Artificial Intelligence · Computer Science 2023-12-20 David Noever , Samantha Elizabeth Miller Noever

In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision…

Computation and Language · Computer Science 2023-10-03 Tatsuki Kawamoto , Takuma Suzuki , Ko Miyama , Takumi Meguro , Tomohiro Takagi

Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the…

Human-Computer Interaction · Computer Science 2026-01-21 Mengli , Duan , Yuhe , Jiang , Matthew Varona , Carolina Nobre

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable…

As Large Language Models (LLMs) are increasingly deployed to generate educational content, a critical safety question arises: can these models reliably estimate the difficulty of the questions they produce? Using Brazil's high-stakes ENEM…

Computers and Society · Computer Science 2026-02-09 Thiago Brant , Julien Kühn , Jun Pang
‹ Prev 1 2 3 10 Next ›