Related papers: Using Vision + Language Models to Predict Item Dif…

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using only the item content. Large Language…

Computers and Society · Computer Science 2026-03-10 Pooya Razavi , Sonya Powers

Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content

In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Yilang Peng , Sijia Qian , Yingdan Lu , Cuihua Shen

Estimating Item Difficulty with Large Language Models as Experts

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow,…

Methodology · Statistics 2026-05-19 Diana Kolesnikova , Kirill Fedyanin , Abe D. Hofman , Matthieu J. S. Brinkhuis , Maria Bolsinova

Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations

Standardized math assessments require expensive human pilot studies to establish the difficulty of test items. We investigate the predictive value of open-source large language models (LLMs) for evaluating the difficulty of multiple-choice…

Computation and Language · Computer Science 2026-04-22 Christabel Acquaye , Yi Ting Huang , Marine Carpuat , Rachel Rudinger

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs.…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Abdelrahman Abdelhamed , Mahmoud Afifi , Alec Go

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks. However, their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains…

Information Retrieval · Computer Science 2025-01-22 Chao Zhang , Haoxin Zhang , Shiwei Wu , Di Wu , Tong Xu , Xiangyu Zhao , Yan Gao , Yao Hu , Enhong Chen

Exploring the Potential of Large Language Models for Estimating the Reading Comprehension Question Difficulty

Reading comprehension is a key for individual success, yet the assessment of question difficulty remains challenging due to the extensive human annotation and large-scale testing required by traditional methods such as linguistic analysis…

Computation and Language · Computer Science 2025-02-26 Yoshee Jain , John Hollander , Amber He , Sunny Tang , Liang Zhang , John Sabatini

Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction

Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine…

Machine Learning · Computer Science 2025-11-11 An Vuong , Minh-Hao Van , Prateek Verma , Chen Zhao , Xintao Wu

Scaling Item-to-Standard Alignment with Large Language Models: Accuracy, Limits, and Solutions

As educational systems evolve, ensuring that assessment items remain aligned with content standards is essential for maintaining fairness and instructional relevance. Traditional human alignment reviews are accurate but slow and…

Artificial Intelligence · Computer Science 2025-11-26 Farzan Karimi-Malekabadi , Pooya Razavi , Sonya Powers

Large Language Models Facilitate Vision Reflection in Image Classification

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

How Multimodal Large Language Models Support Access to Visual Information: A Diary Study With Blind and Low Vision People

Multimodal large language models (MLLMs) are changing how Blind and Low Vision (BLV) people access visual information. Unlike traditional visual interpretation tools that only provide descriptions, MLLM-enabled applications offer…

Human-Computer Interaction · Computer Science 2026-02-20 Ricardo E. Gonzalez Penuela , Crescentia Jung , Sharon Y Lin , Ruiying Hu , Shiri Azenkot

Perceiving Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models

Achieving deep alignment between vision and language remains a central challenge for Multimodal Large Language Models (MLLMs). These models often fail to fully leverage visual input, defaulting to strong language priors. Our approach first…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Aarti Ghatkesar , Ganesh Venkatesh

Text-Based Approaches to Item Difficulty Modeling in Large-Scale Assessments: A Systematic Review

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and…

Computation and Language · Computer Science 2025-09-30 Sydney Peters , Nan Zhang , Hong Jiao , Ming Li , Tianyi Zhou , Robert Lissitz

Improving Visual Storytelling with Multimodal Large Language Models

Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Xiaochuan Lin , Xiangyong Chen

Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Tobias Groot , Matias Valdenegro-Toro

Visual AI and Linguistic Intelligence Through Steerability and Composability

This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision, focusing on model steerability, composability, and the application of long-term…

Artificial Intelligence · Computer Science 2023-12-20 David Noever , Samantha Elizabeth Miller Noever

Application of frozen large-scale models to multimodal task-oriented dialogue

In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision…

Computation and Language · Computer Science 2023-10-03 Tatsuki Kawamoto , Takuma Suzuki , Ko Miyama , Takumi Meguro , Tomohiro Takagi

Do MLLMs See What We See? Analyzing Visualization Literacy Barriers in AI Systems

Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the…

Human-Computer Interaction · Computer Science 2026-01-21 Mengli , Duan , Yuhe , Jiang , Matthew Varona , Carolina Nobre

Rethinking VLMs and LLMs for Image Classification

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable…

Machine Learning · Computer Science 2024-10-22 Avi Cooper , Keizo Kato , Chia-Hsien Shih , Hiroaki Yamane , Kasper Vinken , Kentaro Takemoto , Taro Sunagawa , Hao-Wei Yeh , Jin Yamanaka , Ian Mason , Xavier Boix

Estimating Exam Item Difficulty with LLMs: A Benchmark on Brazil's ENEM Corpus

As Large Language Models (LLMs) are increasingly deployed to generate educational content, a critical safety question arises: can these models reliably estimate the difficulty of the questions they produce? Using Brazil's high-stakes ENEM…

Computers and Society · Computer Science 2026-02-09 Thiago Brant , Julien Kühn , Jun Pang