Enhancing Code LLM Training with Programmer Attention

Yifan Zhang; Chen Huang; Zachary Karas; Dung Thuy Nguyen; Kevin Leach; Yu Huang

doi:10.1145/3696630.3728510

Enhancing Code LLM Training with Programmer Attention

Software Engineering 2025-04-16 v2 Human-Computer Interaction Machine Learning

Authors: Yifan Zhang , Chen Huang , Zachary Karas , Dung Thuy Nguyen , Kevin Leach , Yu Huang

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.

Keywords

code generation large language model training attention mechanism

Cite

@article{arxiv.2503.14936,
  title  = {Enhancing Code LLM Training with Programmer Attention},
  author = {Yifan Zhang and Chen Huang and Zachary Karas and Dung Thuy Nguyen and Kevin Leach and Yu Huang},
  journal= {arXiv preprint arXiv:2503.14936},
  year   = {2025}
}

Enhancing Code LLM Training with Programmer Attention

Abstract

Keywords

Cite

Related papers