English

CodeGENCAT: Generative Computerized Adaptive Testing for Open-ended Coding Problems

Computation and Language 2026-05-28 v2

Abstract

Existing Computerized Adaptive Testing (CAT) frameworks typically select questions based on the predicted likelihood that the student will answer correctly. This design ignores information contained in students' open-ended responses, especially in domains such as programming education, where code structures and bugs contain rich information on student knowledge. In this work, we propose \textbf{Code} \textbf{GEN}erative \textbf{CAT} (\textbf{CodeGENCAT}), a generative CAT framework that selects questions using predicted student code responses. First, we develop a Generative Item Response Theory (GIRT) model that generates code responses conditioned on estimated student knowledge, trained with supervised fine-tuning followed by direct preference optimization for knowledge-response alignment. Second, we introduce three question-selection algorithms that measure uncertainty, coding style diversity, and information from predicted student code responses. Experiments on two real-world programming education datasets show that CodeGENCAT outperforms all CAT baselines, achieving an AUC improvement of up to 4.32\% over the strongest baseline in the early stages of adaptive testing.

Keywords

Cite

@article{arxiv.2602.20020,
  title  = {CodeGENCAT: Generative Computerized Adaptive Testing for Open-ended Coding Problems},
  author = {Wanyong Feng and Alexander Scarlatos and Ruochen Sun and Andrew Lan},
  journal= {arXiv preprint arXiv:2602.20020},
  year   = {2026}
}

Comments

23 pages, 2 figures