Related papers: How Consistent Are Humans When Grading Programming…

Program Equivalence for Assisted Grading of Functional Programs (Extended Version)

In courses that involve programming assignments, giving meaningful feedback to students is an important challenge. Human beings can give useful feedback by manually grading the programs but this is a time-consuming, labor intensive, and…

Programming Languages · Computer Science 2020-10-19 Joshua Clune , Vijay Ramamurthy , Ruben Martins , Umut A. Acar

SimGrade: Using Code Similarity Measures for More Accurate Human Grading

While the use of programming problems on exams is a common form of summative assessment in CS courses, grading such exam problems can be a difficult and inconsistent process. Through an analysis of historical grading patterns we show that…

Computers and Society · Computer Science 2024-03-25 Sonja Johnson-Yu , Nicholas Bowman , Mehran Sahami , Chris Piech

Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades

Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the…

Programming Languages · Computer Science 2024-04-10 Valdemar Švábenský , Maciej Pankiewicz , Jiayi Zhang , Elizabeth B. Cloude , Ryan S. Baker , Eric Fouh

Effects of Human vs. Automatic Feedback on Students' Understanding of AI Concepts and Programming Style

The use of automatic grading tools has become nearly ubiquitous in large undergraduate programming courses, and recent work has focused on improving the quality of automatically generated feedback. However, there is a relative lack of data…

Human-Computer Interaction · Computer Science 2020-11-24 Abe Leite , Saúl A. Blanco

Automated Grading and Feedback Tools for Programming Education: A Systematic Review

We conducted a systematic literature review on automated grading and feedback tools for programming education. We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language…

Software Engineering · Computer Science 2023-12-11 Marcus Messer , Neil C. C. Brown , Michael Kölling , Miaojing Shi

Generative Grading: Near Human-level Accuracy for Automated Feedback on Richly Structured Problems

Access to high-quality education at scale is limited by the difficulty of providing student feedback on open-ended assignments in structured domains like computer programming, graphics, and short response questions. This problem has proven…

Machine Learning · Computer Science 2021-03-25 Ali Malik , Mike Wu , Vrinda Vasavada , Jinpeng Song , Madison Coots , John Mitchell , Noah Goodman , Chris Piech

Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students

Instructors and students alike are often focused on the grade in programming assignments as a key measure of how well a student is mastering the material and whether a student is struggling. This can be, however, misleading. Especially when…

Computers and Society · Computer Science 2023-01-09 Chuqin Geng , Wenwen Xu , Yingjie Xu , Brigitte Pientka , Xujie Si

Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale

Large language models (LLMs) are increasingly used as automated evaluators, yet prior works demonstrate that these LLM judges often lack consistency in scoring when the prompt is altered. However, the effect of the grading scale itself…

Computation and Language · Computer Science 2026-01-08 Weiyue Li , Minda Zhao , Weixuan Dong , Jiahui Cai , Yuze Wei , Michael Pocress , Yi Li , Wanyan Yuan , Xiaoyue Wang , Ruoyu Hou , Kaiyuan Lou , Wenqi Zeng , Yutong Yang , Yilun Du , Mengyu Wang

Towards Transparent AI Grading: Semantic Entropy as a Signal for Human-AI Disagreement

Automated grading systems can efficiently score short-answer responses, yet they often fail to indicate when a grading decision is uncertain or potentially contentious. We introduce semantic entropy, a measure of variability across multiple…

Artificial Intelligence · Computer Science 2025-08-07 Karrtik Iyer , Manikandan Ravikiran , Prasanna Pendse , Shayan Mohanty

Developing Consistency Among Undergraduate Graders Scoring Open-Ended Statistics Tasks

Undergraduate graders are frequently important contributors to the teaching team in post-secondary education settings. This study set out to investigate agreement for a team of undergraduate graders as they acquired training and experience…

Other Statistics · Statistics 2024-10-24 Matthew D. Beckman , Sean Burke , Jack Fiochetta , Benjamin Fry , Susan E. Lloyd , Luke Patterson , Elle Tang

The application of GPT-4 in grading design university students' assignment and providing feedback: An exploratory study

This study aims to investigate whether GPT-4 can effectively grade assignments for design university students and provide useful feedback. In design education, assignments do not have a single correct answer and often involve solving an…

Artificial Intelligence · Computer Science 2024-09-27 Qian Huang , Thijs Willems , King Wang Poon

Automatic Assessment of the Design Quality of Student Python and Java Programs

Programs are a kind of communication to both computers and people, hence as students are trained to write programs they need to learn to write well-designed, readable code rather than code that simply functions correctly. The difficulty in…

Computers and Society · Computer Science 2022-11-08 J. Walker Orr

Humanizing AI Grading: Student-Centered Insights on Fairness, Trust, Consistency and Transparency

This study investigates students' perceptions of Artificial Intelligence (AI) grading systems in an undergraduate computer science course (n = 27), focusing on a block-based programming final project. Guided by the ethical principles…

Artificial Intelligence · Computer Science 2026-02-24 Bahare Riahi , Viktoriia Storozhevykh , Veronica Catete

Instructional Goals and Grading Practices of Graduate Students after One Semester of Teaching Experience

Teaching assistants (TAs) are often responsible for grading student solutions. Since grading communicates instructors' expectations, TAs' grading decisions play a crucial role in forming students' approaches to problem solving (PS) in…

Physics Education · Physics 2016-01-12 Charles Henderson , Emily Marshman , Alexandru Maries , Edit Yerushalmi , Chandralekha Singh

How to measure consumer's inconsistency in sensory testing?

Standard methods, standard test conditions and the use of good sensory practices are key elements of sensory testing. However, while compliance assessment by trained and expert assessors is well developed, few information is available on…

Applications · Statistics 2025-02-07 László Sipos , Kolos Csaba Ágoston , Péter Biró , Sándor Bozóki , László Csató

CoGrader: Transforming Instructors' Assessment of Project Reports through Collaborative LLM Integration

Grading project reports are increasingly significant in today's educational landscape, where they serve as key assessments of students' comprehensive problem-solving abilities. However, it remains challenging due to the multifaceted…

Human-Computer Interaction · Computer Science 2025-08-19 Zixin Chen , Jiachen Wang , Yumeng Li , Haobo Li , Chuhan Shi , Rong Zhang , Huamin Qu

Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring

Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by…

Computation and Language · Computer Science 2025-02-24 Xuansheng Wu , Padmaja Pravin Saraf , Gyeonggeon Lee , Ehsan Latif , Ninghao Liu , Xiaoming Zhai

Investigating the Essential of Meaningful Automated Formative Feedback for Programming Assignments

This study investigated the essential of meaningful automated feedback for programming assignments. Three different types of feedback were tested, including (a) What's wrong - what test cases were testing and which failed, (b) Gap -…

Human-Computer Interaction · Computer Science 2019-10-09 Qiang Hao , Jack P Wilson , Camille Ottaway , Naitra Iriumi , Kai Arakawa , David H Smith

Characteristics of hand and machine-assigned scores to college students' answers to open-ended tasks

Assessment of learning in higher education is a critical concern to policy makers, educators, parents, and students. And, doing so appropriately is likely to require including constructed response tests in the assessment system. We examined…

Applications · Statistics 2008-12-18 Stephen P. Klein

Grading Practices and Considerations of Graduate Students at the Beginning of their Teaching Assignment

Research shows that expert-like approaches to problem-solving can be promoted by encouraging students to explicate their thought processes and follow a prescribed problem-solving strategy. Since grading communicates instructors'…

Physics Education · Physics 2016-01-12 Edit Yerushalmi , Emily Marshman , Alexandru Maries , Charles R. Henderson , Chandralekha Singh