English

EduCoder: An Open-Source Annotation System for Education Transcript Data

Computation and Language 2026-05-06 v5

Abstract

We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of coding education dialogue transcripts -- with diverse teacher-student and peer interactions. Common challenges include defining codebooks for complex pedagogical features, supporting both open-ended and categorical coding, and contextualizing utterances with external features, such as the lesson's purpose and the pedagogical value of the instruction. EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators' responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source, with a demo video available.

Keywords

Cite

@article{arxiv.2507.05385,
  title  = {EduCoder: An Open-Source Annotation System for Education Transcript Data},
  author = {Saad Ashraf and James Malamut and Vishal Kumar and Guanzhong Pan and Hyunji Nam and Mei Tan and Lucía Langlois and Liliana Deonizio and Helen Higgins and Dorottya Demszky},
  journal= {arXiv preprint arXiv:2507.05385},
  year   = {2026}
}