Advancing Language Models for Code-related Tasks

Zhao Tian

Advancing Language Models for Code-related Tasks

Software Engineering 2026-01-09 v1 Artificial Intelligence Computation and Language

Authors: Zhao Tian

Abstract

Recent advances in language models (LMs) have driven significant progress in various software engineering tasks. However, existing LMs still struggle with complex programming scenarios due to limitations in data quality, model architecture, and reasoning capability. This research systematically addresses these challenges through three complementary directions: (1) improving code data quality with a code difference-guided adversarial augmentation technique (CODA) and a code denoising technique (CodeDenoise); (2) enhancing model architecture via syntax-guided code LMs (LEAM and LEAM++); and (3) advancing model reasoning with a prompting technique (muFiX) and an agent-based technique (Specine). These techniques aim to promote the practical adoption of LMs in software development and further advance intelligent software engineering.

Keywords

code generation large language model evolutionary optimization

Cite

@article{arxiv.2601.04526,
  title  = {Advancing Language Models for Code-related Tasks},
  author = {Zhao Tian},
  journal= {arXiv preprint arXiv:2601.04526},
  year   = {2026}
}

Comments

Accepted by ICSE 2026 (DS)

Related papers

View all related →