English

Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems

Artificial Intelligence 2026-05-19 v4 Computation and Language Computers and Society Machine Learning Software Engineering

Abstract

Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor intensive. We present an automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on two real-world student code submission datasets in different programming languages.We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.

Keywords

Cite

@article{arxiv.2502.18632,
  title  = {Automated Knowledge Component Generation for Interpretable Knowledge Tracing in Coding Problems},
  author = {Zhangqi Duan and Nigel Fernandez and Arun Balajiee Lekshmi Narayanan and Mohammad Hassany and Rafaella Sampaio de Alencar and Peter Brusilovsky and Bita Akram and Andrew Lan},
  journal= {arXiv preprint arXiv:2502.18632},
  year   = {2026}
}

Comments

Findings of ACL 2026: The 64th Annual Meeting of the Association for Computational Linguistics