English

A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization

Artificial Intelligence 2025-06-05 v2 Computation and Language

Abstract

Open-ended short-answer questions (SAGs) have been widely recognized as a powerful tool for providing deeper insights into learners' responses in the context of learning analytics (LA). However, SAGs often present challenges in practice due to the high grading workload and concerns about inconsistent assessments. With recent advancements in natural language processing (NLP), automatic short-answer grading (ASAG) offers a promising solution to these challenges. Despite this, current ASAG algorithms are often limited in generalizability and tend to be tailored to specific questions. In this paper, we propose a unified multi-agent ASAG framework, GradeOpt, which leverages large language models (LLMs) as graders for SAGs. More importantly, GradeOpt incorporates two additional LLM-based agents - the reflector and the refiner - into the multi-agent system. This enables GradeOpt to automatically optimize the original grading guidelines by performing self-reflection on its errors. Through experiments on a challenging ASAG task, namely the grading of pedagogical content knowledge (PCK) and content knowledge (CK) questions, GradeOpt demonstrates superior performance in grading accuracy and behavior alignment with human graders compared to representative baselines. Finally, comprehensive ablation studies confirm the effectiveness of the individual components designed in GradeOpt.

Keywords

Cite

@article{arxiv.2410.02165,
  title  = {A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization},
  author = {Yucheng Chu and Hang Li and Kaiqi Yang and Harry Shomer and Hui Liu and Yasemin Copur-Gencturk and Jiliang Tang},
  journal= {arXiv preprint arXiv:2410.02165},
  year   = {2025}
}

Comments

EDM 2025 Long Paper

R2 v1 2026-06-28T19:06:25.232Z