Related papers: AVATAR: A Parallel Corpus for Java-Python Program …

Leveraging Automated Unit Tests for Unsupervised Code Translation

With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method…

Software Engineering · Computer Science 2022-02-17 Baptiste Roziere , Jie M. Zhang , Francois Charton , Mark Harman , Gabriel Synnaeve , Guillaume Lample

Syntax and Domain Aware Model for Unsupervised Program Translation

There is growing interest in software migration as the development of software and society. Manually migrating projects between languages is error-prone and expensive. In recent years, researchers have begun to explore automatic program…

Software Engineering · Computer Science 2023-03-13 Fang Liu , Jia Li , Li Zhang

SAR: Learning Cross-Language API Mappings with Little Knowledge

To save manual effort, developers often translate programs from one programming language to another, instead of implementing it from scratch. Translating application program interfaces (APIs) used in one language to functionally equivalent…

Machine Learning · Computer Science 2019-06-11 Nghi D. Q. Bui , Yijun Yu , Lingxiao Jiang

Better Together? An Evaluation of AI-Supported Code Translation

Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art…

Human-Computer Interaction · Computer Science 2022-02-17 Justin D. Weisz , Michael Muller , Steven I. Ross , Fernando Martinez , Stephanie Houde , Mayank Agarwal , Kartik Talamadupula , John T. Richards

Automated Python Translation

Python is one of the most commonly used programming languages in industry and education. Its English keywords and built-in functions/modules allow it to come close to pseudo-code in terms of its readability and ease of writing. However,…

Computation and Language · Computer Science 2025-04-17 Joshua Otten , Antonios Anastasopoulos , Kevin Moran

Unsupervised Translation of Programming Languages

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port…

Computation and Language · Computer Science 2020-09-23 Marie-Anne Lachaux , Baptiste Roziere , Lowik Chanussot , Guillaume Lample

A Joint Learning Model with Variational Interaction for Multilingual Program Translation

Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across…

Software Engineering · Computer Science 2024-09-16 Yali Du , Hui Sun , Ming Li

ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training

Code translation is a crucial process in software development and migration projects, enabling interoperability between different programming languages and enhancing software adaptability and thus longevity. Traditional automated…

Artificial Intelligence · Computer Science 2025-07-23 Shreya Saxena , Siva Prasad , Zishan Ahmad , Vishal Vaddina

J-Parallelio -- automatic parallelization framework for Java virtual machine code

Manual translation of the algorithms from sequential version to its parallel counterpart is time consuming and can be done only with the specific knowledge of hardware accelerator architecture, parallel programming or programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-16 Krzysztof Stuglik , Piotr Listkiewicz , Mateusz Kulczyk , Marcin Pietron

Unified Pre-training for Program Understanding and Generation

Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a…

Computation and Language · Computer Science 2021-04-13 Wasi Uddin Ahmad , Saikat Chakraborty , Baishakhi Ray , Kai-Wei Chang

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

Large Language Models (LLMs) have achieved remarkable success in automated code translation. While prior work has focused on improving translation accuracy through advanced prompting and iterative repair, the reliability of the underlying…

Software Engineering · Computer Science 2026-05-11 Fazle Rabbi , Soumit Kanti Saha , Jinqiu Yang

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…

Computation and Language · Computer Science 2021-10-12 Mayank Agarwal , Kartik Talamadupula , Fernando Martinez , Stephanie Houde , Michael Muller , John Richards , Steven I Ross , Justin D. Weisz

On the Impact of Language Selection for Training and Evaluating Programming Language Models

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models. The remarkable progress made in this domain not only applies to natural…

Software Engineering · Computer Science 2023-08-28 Jonathan Katzy , Maliheh Izadi , Arie van Deursen

RPT: Effective and Efficient Retrieval of Program Translations from Big Code

Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big…

Software Engineering · Computer Science 2021-03-25 Binger Chen , Ziawasch Abedjan

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these…

Software Engineering · Computer Science 2024-01-17 Arumoy Shome , Luis Cruz , Arie van Deursen

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks…

Software Engineering · Computer Science 2023-03-29 Deze Wang , Boxing Chen , Shanshan Li , Wei Luo , Shaoliang Peng , Wei Dong , Xiangke Liao

JavaBERT: Training a transformer-based model for the Java programming language

Code quality is and will be a crucial factor while developing new software code, requiring appropriate tools to ensure functional and reliable code. Machine learning techniques are still rarely used for software engineering tools, missing…

Software Engineering · Computer Science 2021-10-22 Nelson Tavares de Sousa , Wilhelm Hasselbring

JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus

Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although there are billions of parallel sentences for a few…

Computation and Language · Computer Science 2022-03-01 Makoto Morishita , Katsuki Chousa , Jun Suzuki , Masaaki Nagata

Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs

Code translation, the automatic conversion of programs between languages, is a growing use case for Large Language Models (LLMs). However, direct one-shot translation often fails to preserve program intent, leading to errors in control…

Software Engineering · Computer Science 2026-02-19 Shahriar Rumi Dipto , Saikat Mondal , Chanchal K. Roy

A Parallel Evaluation Data Set of Software Documentation with Document Structure Annotation

This paper accompanies the software documentation data set for machine translation, a parallel evaluation data set of data originating from the SAP Help Portal, that we released to the machine translation community for research purposes. It…

Computation and Language · Computer Science 2020-11-13 Bianka Buschbeck , Miriam Exel