Related papers: How Consistent Are Humans When Grading Programming…
In courses that involve programming assignments, giving meaningful feedback to students is an important challenge. Human beings can give useful feedback by manually grading the programs but this is a time-consuming, labor intensive, and…
While the use of programming problems on exams is a common form of summative assessment in CS courses, grading such exam problems can be a difficult and inconsistent process. Through an analysis of historical grading patterns we show that…
Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the…
The use of automatic grading tools has become nearly ubiquitous in large undergraduate programming courses, and recent work has focused on improving the quality of automatically generated feedback. However, there is a relative lack of data…
We conducted a systematic literature review on automated grading and feedback tools for programming education. We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language…
Access to high-quality education at scale is limited by the difficulty of providing student feedback on open-ended assignments in structured domains like computer programming, graphics, and short response questions. This problem has proven…
Instructors and students alike are often focused on the grade in programming assignments as a key measure of how well a student is mastering the material and whether a student is struggling. This can be, however, misleading. Especially when…
Large language models (LLMs) are increasingly used as automated evaluators, yet prior works demonstrate that these LLM judges often lack consistency in scoring when the prompt is altered. However, the effect of the grading scale itself…
Automated grading systems can efficiently score short-answer responses, yet they often fail to indicate when a grading decision is uncertain or potentially contentious. We introduce semantic entropy, a measure of variability across multiple…
Undergraduate graders are frequently important contributors to the teaching team in post-secondary education settings. This study set out to investigate agreement for a team of undergraduate graders as they acquired training and experience…
This study aims to investigate whether GPT-4 can effectively grade assignments for design university students and provide useful feedback. In design education, assignments do not have a single correct answer and often involve solving an…
Programs are a kind of communication to both computers and people, hence as students are trained to write programs they need to learn to write well-designed, readable code rather than code that simply functions correctly. The difficulty in…
This study investigates students' perceptions of Artificial Intelligence (AI) grading systems in an undergraduate computer science course (n = 27), focusing on a block-based programming final project. Guided by the ethical principles…
Teaching assistants (TAs) are often responsible for grading student solutions. Since grading communicates instructors' expectations, TAs' grading decisions play a crucial role in forming students' approaches to problem solving (PS) in…
Standard methods, standard test conditions and the use of good sensory practices are key elements of sensory testing. However, while compliance assessment by trained and expert assessors is well developed, few information is available on…
Grading project reports are increasingly significant in today's educational landscape, where they serve as key assessments of students' comprehensive problem-solving abilities. However, it remains challenging due to the multifaceted…
Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by…
This study investigated the essential of meaningful automated feedback for programming assignments. Three different types of feedback were tested, including (a) What's wrong - what test cases were testing and which failed, (b) Gap -…
Assessment of learning in higher education is a critical concern to policy makers, educators, parents, and students. And, doing so appropriately is likely to require including constructed response tests in the assessment system. We examined…
Research shows that expert-like approaches to problem-solving can be promoted by encouraging students to explicate their thought processes and follow a prescribed problem-solving strategy. Since grading communicates instructors'…