English

Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

Artificial Intelligence 2026-05-14 v1 Software Engineering

Abstract

LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework. Extensive experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies.

Keywords

Cite

@article{arxiv.2605.13229,
  title  = {Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization},
  author = {Yuhan Wu and Huan Zhang and Wei Cheng and Chen Shen and Jingyue Yang and Wei Hu},
  journal= {arXiv preprint arXiv:2605.13229},
  year   = {2026}
}

Comments

Accepted in the 35th International Joint Conference on Artificial Intelligence (IJCAI 2016)