English

Advancing Language Models for Code-related Tasks

Software Engineering 2026-01-09 v1 Artificial Intelligence Computation and Language

Abstract

Recent advances in language models (LMs) have driven significant progress in various software engineering tasks. However, existing LMs still struggle with complex programming scenarios due to limitations in data quality, model architecture, and reasoning capability. This research systematically addresses these challenges through three complementary directions: (1) improving code data quality with a code difference-guided adversarial augmentation technique (CODA) and a code denoising technique (CodeDenoise); (2) enhancing model architecture via syntax-guided code LMs (LEAM and LEAM++); and (3) advancing model reasoning with a prompting technique (muFiX) and an agent-based technique (Specine). These techniques aim to promote the practical adoption of LMs in software development and further advance intelligent software engineering.

Keywords

Cite

@article{arxiv.2601.04526,
  title  = {Advancing Language Models for Code-related Tasks},
  author = {Zhao Tian},
  journal= {arXiv preprint arXiv:2601.04526},
  year   = {2026}
}

Comments

Accepted by ICSE 2026 (DS)