Related papers: A LLM-Powered Automatic Grading Framework with Hum…

Language Models are Few-Shot Graders

Providing evaluations to student work is a critical component of effective student learning, and automating its process can significantly reduce the workload on human graders. Automatic Short Answer Grading (ASAG) systems, enabled by…

Computation and Language · Computer Science 2025-02-20 Chenyan Zhao , Mariana Silva , Seth Poulsen

ASAG2024: A Combined Benchmark for Short Answer Grading

Open-ended questions test a more thorough understanding than closed-ended questions and are often a preferred assessment method. However, open-ended questions are tedious to grade and subject to personal bias. Therefore, there have been…

Artificial Intelligence · Computer Science 2024-09-30 Gérôme Meyer , Philip Breuer , Jonathan Fürst

LLM-based Automated Grading with Human-in-the-Loop

The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the field of education. Among various applications, automatic short answer grading (ASAG), which…

Computation and Language · Computer Science 2025-12-02 Yucheng Chu , Hang Li , Kaiqi Yang , Yasemin Copur-Gencturk , Jiliang Tang

Towards LLM-based Autograding for Short Textual Answers

Grading exams is an important, labor-intensive, subjective, repetitive, and frequently challenging task. The feasibility of autograding textual responses has greatly increased thanks to the availability of large language models (LLMs) such…

Computation and Language · Computer Science 2024-07-09 Johannes Schneider , Bernd Schenk , Christina Niklaus

Automatic Short Answer Grading via Multiway Attention Networks

Automatic short answer grading (ASAG), which autonomously score student answers according to reference answers, provides a cost-effective and consistent approach to teaching professionals and can reduce their monotonous and tedious grading…

Artificial Intelligence · Computer Science 2019-09-27 Tiaoqiao Liu , Wenbiao Ding , Zhiwei Wang , Jiliang Tang , Gale Yan Huang , Zitao Liu

Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory

Automated short answer grading (ASAG) with large language models (LLMs) is commonly evaluated with aggregate metrics such as macro-F1 and Cohen's kappa. However, these metrics provide limited insight into how grading performance varies…

Computation and Language · Computer Science 2026-05-14 Longwei Cong , Sonja Hahn , Sebastian Gombert , Leon Camus , Hendrik Drachsler , Ulf Kroehne

A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education

Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning.…

Computers and Society · Computer Science 2025-01-27 Calvin Yeung , Jeff Yu , King Chau Cheung , Tat Wing Wong , Chun Man Chan , Kin Chi Wong , Keisuke Fujii

Confidence Estimation in Automatic Short Answer Grading with LLMs

Automatic Short Answer Grading (ASAG) with generative large language models (LLMs) has recently demonstrated strong performance without task-specific fine-tuning, while also enabling the generation of synthetic feedback for educational…

Computation and Language · Computer Science 2026-05-14 Longwei Cong , Sonja Hahn , Sebastian Gombert , Leon Camus , Hendrik Drachsler , Ulf Kroehne

Grade Guard: A Smart System for Short Answer Automated Grading

The advent of large language models (LLMs) in the education sector has provided impetus to automate grading short answer questions. LLMs make evaluating short answers very efficient, thus addressing issues like staff shortage. However, in…

Computation and Language · Computer Science 2025-04-03 Niharika Dadu , Harsh Vardhan Singh , Romi Banerjee

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset

Open-ended questions, which require students to produce multi-word, nontrivial responses, are a popular tool for formative assessment as they provide more specific insights into what students do and don't know. However, grading open-ended…

Computation and Language · Computer Science 2024-05-07 Owen Henkel , Libby Hills , Bill Roberts , Joshua McGrane

Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly…

Computation and Language · Computer Science 2025-06-05 Yucheng Chu , Peng He , Hang Li , Haoyu Han , Kaiqi Yang , Yu Xue , Tingting Li , Joseph Krajcik , Jiliang Tang

Leveraging LLM Agents for Automated Optimization Modeling for SASP Problems: A Graph-RAG based Approach

Automated optimization modeling (AOM) has evoked considerable interest with the rapid evolution of large language models (LLMs). Existing approaches predominantly rely on prompt engineering, utilizing meticulously designed expert response…

Artificial Intelligence · Computer Science 2025-01-31 Tianpeng Pan , Wenqiang Pu , Licheng Zhao , Rui Zhou

Short Answer Grading Using One-shot Prompting and Text Similarity Scoring Model

In this study, we developed an automated short answer grading (ASAG) model that provided both analytic scores and final holistic scores. Short answer items typically consist of multiple sub-questions, and providing an analytic score and the…

Computation and Language · Computer Science 2023-05-31 Su-Youn Yoon

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

Large language models (LLMs) are increasingly deployed as educational agents for automatic short answer grading (ASAG) in real-world educational environments, significantly boosting assessment efficiency and scalability. However, when these…

Cryptography and Security · Computer Science 2026-05-25 Xueyi Li , Zhuoneng Zhou , Zitao Liu , Yongdong Wu

Towards Human-Like Grading: A Unified LLM-Enhanced Framework for Subjective Question Evaluation

Automatic grading of subjective questions remains a significant challenge in examination assessment due to the diversity in question formats and the open-ended nature of student responses. Existing works primarily focus on a specific type…

Computation and Language · Computer Science 2025-10-10 Fanwei Zhua , Jiaxuan He , Xiaoxiao Chen , Zulong Chen , Quan Lu , Chenrui Mei

Improving Retrospective Language Agents via Joint Policy Gradient Optimization

In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although…

Computation and Language · Computer Science 2025-03-04 Xueyang Feng , Bo Lan , Quanyu Dai , Lei Wang , Jiakai Tang , Xu Chen , Zhenhua Dong , Ji-Rong Wen

Automatic Question & Answer Generation Using Generative Large Language Model (LLM)

In the realm of education, student evaluation holds equal significance to imparting knowledge. To be evaluated, students usually need to go through text-based academic assessment methods. Instructors need to make a diverse set of questions…

Computation and Language · Computer Science 2025-09-30 Md. Alvee Ehsan , A. S. M Mehedi Hasan , Kefaya Benta Shahnoor , Syeda Sumaiya Tasneem

Auditing an Automatic Grading Model with deep Reinforcement Learning

We explore the use of deep reinforcement learning to audit an automatic short answer grading (ASAG) model. Automatic grading may decrease the time burden of rating open-ended items for educators, but a lack of robust evaluation methods for…

Artificial Intelligence · Computer Science 2024-05-14 Aubrey Condor , Zachary Pardos

Can MLLMs generate human-like feedback in grading multimodal short answers?

In education, the traditional Automatic Short Answer Grading (ASAG) with feedback problem has focused primarily on evaluating text-only responses. However, real-world assessments often include multimodal responses containing both diagrams…

Artificial Intelligence · Computer Science 2026-02-06 Pritam Sil , Pushpak Bhattacharyya , Pawan Goyal , Ganesh Ramakrishnan

SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

Subjective Answer Grading (SAG) plays a crucial role in education, standardized testing, and automated assessment systems, particularly for evaluating short-form responses in Short Answer Scoring (SAS). However, existing approaches often…

Computation and Language · Computer Science 2025-05-16 Peichao Lai , Kexuan Zhang , Yi Lin , Linyihan Zhang , Feiyang Ye , Jinhao Yan , Yanwei Xu , Conghui He , Yilei Wang , Wentao Zhang , Bin Cui