English
Related papers

Related papers: An Automatic Question Usability Evaluation Toolkit

200 papers

This paper presents CRACQ, a multi-dimensional evaluation framework tailored to evaluate documents across f i v e specific traits: Coherence, Rigor, Appropriateness, Completeness, and Quality. Building on insights from traitbased Automated…

Computation and Language · Computer Science 2025-10-06 Ishak Soltani , Francisco Belo , Bernardo Tavares

Multiple-choice questions (MCQs) offer the most promising avenue for skill evaluation in the era of virtual education and job recruiting, where traditional performance-based alternatives such as projects and essays have become less viable,…

Computers and Society · Computer Science 2020-12-22 Eric Li , Jingyi Su , Hao Sheng , Lawrence Wai

This research is aimed to propose an artificial intelligence algorithm comprising an ontology-based design, text mining, and natural language processing for automatically generating gap-fill multiple choice questions (MCQs). The simulation…

Artificial Intelligence · Computer Science 2021-09-24 Pornpat Sirithumgul , Pimpaka Prasertsilp , Lorne Olfman

Multiple-choice questions (MCQs) are widely used across diverse educational fields and levels. Well-designed MCQs should evaluate knowledge application in real-world situations. However, writing such test items in sufficient numbers is…

Human-Computer Interaction · Computer Science 2026-02-10 Tetiana Krushynska , Jani Ursin , Ville Heilala

As scientific knowledge grows at an unprecedented pace, evaluation benchmarks must evolve to reflect new discoveries and ensure language models are tested on current, diverse literature. We propose a scalable, modular framework for…

Computation and Language · Computer Science 2025-09-16 Ozan Gokdemir , Neil Getty , Robert Underwood , Sandeep Madireddy , Franck Cappello , Arvind Ramanathan , Ian T. Foster , Rick L. Stevens

We introduce SHEET, a multi-purpose open-source toolkit designed to accelerate subjective speech quality assessment (SSQA) research. SHEET stands for the Speech Human Evaluation Estimation Toolkit, which focuses on data-driven deep neural…

Sound · Computer Science 2025-05-22 Wen-Chin Huang , Erica Cooper , Tomoki Toda

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the…

Despite their sophisticated capabilities, large language models (LLMs) encounter a major hurdle in effective assessment. This paper first revisits the prevalent evaluation method-multiple choice question answering (MCQA), which allows for…

Computation and Language · Computer Science 2024-03-13 Fangyun Wei , Xi Chen , Lin Luo

Advances in large language models (LLMs) are rapidly transforming scientific work, yet empirical evidence on how these systems reshape research activities remains limited. We report a mixed-methods pilot evaluation of an AI-orchestrated…

Computers and Society · Computer Science 2026-02-24 Yuan An

Artificial intelligence (AI) is transforming society, making it crucial to prepare the next generation through AI literacy in K-12 education. However, scalable and reliable AI literacy materials and assessment resources are lacking. To…

Human-Computer Interaction · Computer Science 2024-12-03 Jiayi Wang , Ruiwei Xiao , Ying-Jui Tseng

Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for…

Computation and Language · Computer Science 2023-07-18 Steven Moore , Huy A. Nguyen , Tianying Chen , John Stamper

One of the most widely used tasks for evaluating Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA). While open-ended question answering tasks are more challenging to evaluate, MCQA tasks are, in principle, easier to…

Computation and Language · Computer Science 2025-06-10 Francesco Maria Molfese , Luca Moroni , Luca Gioffré , Alessandro Scirè , Simone Conia , Roberto Navigli

High-quality test items are essential for educational assessments, particularly within Item Response Theory (IRT). Traditional validation methods rely on resource-intensive pilot testing to estimate item difficulty and discrimination. More…

Computation and Language · Computer Science 2025-08-08 Robin Schmucker , Steven Moore

Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on…

Computation and Language · Computer Science 2021-11-22 Lin Ni , Qiming Bao , Xiaoxuan Li , Qianqian Qi , Paul Denny , Jim Warren , Michael Witbrock , Jiamou Liu

Automatic question generation (QG) is essential for AI and NLP, particularly in intelligent tutoring, dialogue systems, and fact verification. Generating multiple-choice questions (MCQG) for professional exams, like the United States…

Computation and Language · Computer Science 2025-02-11 Zonghai Yao , Aditya Parashar , Huixue Zhou , Won Seok Jang , Feiyun Ouyang , Zhichao Yang , Hong Yu

Multiple choice questions (MCQs) are a popular method for evaluating students' knowledge due to their efficiency in administration and grading. Crafting high-quality math MCQs is a labor-intensive process that requires educators to…

Computation and Language · Computer Science 2024-05-03 Jaewook Lee , Digory Smith , Simon Woodhead , Andrew Lan

Evaluation of QA systems is very challenging and expensive, with the most reliable approach being human annotations of correctness of answers for questions. Recent works (AVA, BEM) have shown that transformer LM encoder based similarity…

Computation and Language · Computer Science 2023-09-22 Matteo Gabburo , Siddhant Garg , Rik Koncel Kedziorski , Alessandro Moschitti

Multiple-choice question answering (MCQA) is standard in NLP, but benchmarks lack rigorous quality control. We present BenchMarker, an education-inspired toolkit using LLM judges to flag three common MCQ flaws: 1) contamination: items…

Parsing (also called syntax analysis) techniques cover a substantial portion of any undergraduate Compiler Design course. We present ParseIT, a tool to help students understand the parsing techniques through question-answering. ParseIT…

Programming Languages · Computer Science 2017-02-03 Amey Karkare , Nimisha Agarwal

Automated question generation is an important approach to enable personalisation of English comprehension assessment. Recently, transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from…

Computation and Language · Computer Science 2022-09-27 Vatsal Raina , Mark Gales
‹ Prev 1 2 3 10 Next ›