English
Related papers

Related papers: Estimating Item Difficulty Using Large Language Mo…

200 papers

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow,…

This project investigates the capabilities of large language models (LLMs) to determine the difficulty of data visualization literacy test items. We explore whether features derived from item text (question and answer options), the…

Artificial Intelligence · Computer Science 2026-03-06 Samin Khan

Item difficulty plays a crucial role in test performance, interpretability of scores, and equity for all test-takers, especially in large-scale assessments. Traditional approaches to item difficulty modeling rely on field testing and…

Computation and Language · Computer Science 2025-09-30 Sydney Peters , Nan Zhang , Hong Jiao , Ming Li , Tianyi Zhou , Robert Lissitz

Standardized math assessments require expensive human pilot studies to establish the difficulty of test items. We investigate the predictive value of open-source large language models (LLMs) for evaluating the difficulty of multiple-choice…

Computation and Language · Computer Science 2026-04-22 Christabel Acquaye , Yi Ting Huang , Marine Carpuat , Rachel Rudinger

Educational assessment relies heavily on knowing question difficulty, traditionally determined through resource-intensive pre-testing with students. This creates significant barriers for both classroom teachers and assessment developers. We…

Computers and Society · Computer Science 2026-02-03 Matias Hoyl

As educational systems evolve, ensuring that assessment items remain aligned with content standards is essential for maintaining fairness and instructional relevance. Traditional human alignment reviews are accurate but slow and…

Artificial Intelligence · Computer Science 2025-11-26 Farzan Karimi-Malekabadi , Pooya Razavi , Sonya Powers

Reading comprehension is a key for individual success, yet the assessment of question difficulty remains challenging due to the extensive human annotation and large-scale testing required by traditional methods such as linguistic analysis…

Computation and Language · Computer Science 2025-02-26 Yoshee Jain , John Hollander , Amber He , Sunny Tang , Liang Zhang , John Sabatini

Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel…

Methodology · Statistics 2026-03-17 Jia Liu , Zhiyu Xu , Yuqi Gu

Large language models (LLMs) have demonstrated rapid progress across a wide array of domains. Owing to the very large number of parameters and training data in LLMs, these models inherently encompass an expansive and comprehensive materials…

Materials Science · Physics 2024-11-20 Siyu Liu , Tongqi Wen , A. S. L. Subrahmanyam Pattamatta , David J. Srolovitz

Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners. Unlike syntactic and semantic features, such as passage length or semantic similarity…

Computation and Language · Computer Science 2026-05-20 Seonjeong Hwang , Hyounghun Kim , Gary Geunbae Lee

Prediction of item difficulty based on its text content is of substantial interest. In this paper, we focus on the related problem of recovering IRT-based difficulty when the data originally reported item p-value (percent correct…

Computation and Language · Computer Science 2026-04-01 Radhika Kapoor , Sang T. Truong , Nick Haber , Maria Araceli Ruiz-Primo , Benjamin W. Domingue

As Large Language Models (LLMs) are increasingly deployed to generate educational content, a critical safety question arises: can these models reliably estimate the difficulty of the questions they produce? Using Brazil's high-stakes ENEM…

Computers and Society · Computer Science 2026-02-09 Thiago Brant , Julien Kühn , Jun Pang

Large Language Models (LLMs) have achieved remarkable success in various fields, prompting several studies to explore their potential in recommendation systems. However, these attempts have so far resulted in only modest improvements over…

Information Retrieval · Computer Science 2024-09-20 Junyi Chen , Lu Chi , Bingyue Peng , Zehuan Yuan

Generative recommendation systems, driven by large language models (LLMs), present an innovative approach to predicting user preferences by modeling items as token sequences and generating recommendations in a generative manner. A critical…

Recent advances in the finetuning of large language models (LLMs) have significantly improved their performance on established benchmarks, emphasizing the need for increasingly difficult, synthetic data. A key step in this data generation…

Machine Learning · Computer Science 2025-12-17 Marthe Ballon , Andres Algaba , Brecht Verbeken , Vincent Ginis

Aligning test items to content standards is a critical step in test development to collect validity evidence based on content. Item alignment has typically been conducted by human experts. This judgmental process can be subjective and…

Computation and Language · Computer Science 2025-10-14 Yanbin Fu , Hong Jiao , Tianyi Zhou , Nan Zhang , Ming Li , Qingshu Xu , Sydney Peters , Robert W. Lissitz

Accurate estimation of project costs and durations remains a pivotal challenge in software engineering, directly impacting budgeting and resource management. Traditional estimation techniques, although widely utilized, often fall short due…

Software Engineering · Computer Science 2024-09-17 Justin Carpenter , Chia-Ying Wu , Nasir U. Eisty

Traditional methods for determining assessment item parameters, such as difficulty and discrimination, rely heavily on expensive field testing to collect student performance data for Item Response Theory (IRT) calibration. This study…

Computation and Language · Computer Science 2026-01-07 Christopher Ormerod

We investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data…

Computation and Language · Computer Science 2025-11-27 Yeganeh Kordi , Nihal V. Nayak , Max Zuo , Ilana Nguyen , Stephen H. Bach

Material selection is a crucial step in conceptual design due to its significant impact on the functionality, aesthetics, manufacturability, and sustainability impact of the final product. This study investigates the use of Large Language…

Computation and Language · Computer Science 2024-05-08 Daniele Grandi , Yash Patawari Jain , Allin Groom , Brandon Cramer , Christopher McComb
‹ Prev 1 2 3 10 Next ›