Related papers: Beyond Code Pairs: Dialogue-Based Data Generation …

Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Translating legacy Fortran code into C++ is a crucial step in modernizing high-performance computing (HPC) applications. However, the scarcity of high-quality, parallel Fortran-to-C++ datasets and the limited domain-specific expertise in…

Machine Learning · Computer Science 2025-02-04 Le Chen , Bin Lei , Dunzhi Zhou , Pei-Hung Lin , Chunhua Liao , Caiwen Ding , Ali Jannesari

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source…

Software Engineering · Computer Science 2023-09-20 Bin Lei , Caiwen Ding , Le Chen , Pei-Hung Lin , Chunhua Liao

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

Large Language Models (LLMs) have achieved remarkable success in automated code translation. While prior work has focused on improving translation accuracy through advanced prompting and iterative repair, the reliability of the underlying…

Software Engineering · Computer Science 2026-05-11 Fazle Rabbi , Soumit Kanti Saha , Jinqiu Yang

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective…

Computation and Language · Computer Science 2025-06-27 Leitian Tao , Xiang Chen , Tong Yu , Tung Mai , Ryan Rossi , Yixuan Li , Saayan Mitra

Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design

Large language models (LLMs) have shown promise for automated source-code translation, a capability critical to software migration, maintenance, and interoperability. Yet comparative evidence on how model choice, prompt design, and prompt…

Software Engineering · Computer Science 2025-09-17 Aamer Aljagthami , Mohammed Banabila , Musab Alshehri , Mohammed Kabini , Mohammad D. Alahmadi

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-29 Ali TehraniJamsaz , Arijit Bhattacharjee , Le Chen , Nesreen K. Ahmed , Amir Yazdanbakhsh , Ali Jannesari

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code…

Software Engineering · Computer Science 2024-01-17 Rangeet Pan , Ali Reza Ibrahimzada , Rahul Krishna , Divya Sankar , Lambert Pouguem Wassi , Michele Merler , Boris Sobolev , Raju Pavuluri , Saurabh Sinha , Reyhaneh Jabbarvand

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional…

Machine Learning · Computer Science 2023-11-28 Naman Jain , Tianjun Zhang , Wei-Lin Chiang , Joseph E. Gonzalez , Koushik Sen , Ion Stoica

Developer-LLM Conversations: An Empirical Study of Interactions and Generated Code Quality

Large Language Models (LLMs) are becoming integral to modern software development workflows, assisting developers with code generation, API explanation, and iterative problem-solving through natural language conversations. Despite…

Software Engineering · Computer Science 2025-09-15 Suzhen Zhong , Ying Zou , Bram Adams

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study

Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes by both domain-experts and non-domain experts. Fortran has served as one of the go to programming languages in legacy…

Software Engineering · Computer Science 2025-04-23 Nishath Rajiv Ranasinghe , Shawn M. Jones , Michal Kucer , Ayan Biswas , Daniel O'Malley , Alexander Buschmann Most , Selma Liliane Wanna , Ajay Sreekumar

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

The adoption of Large Language Models (LLMs) for code generation in data science offers substantial potential for enhancing tasks such as data manipulation, statistical analysis, and visualization. However, the effectiveness of these models…

Software Engineering · Computer Science 2024-11-20 Nathalia Nascimento , Everton Guimaraes , Sai Sanjna Chintakunta , Santhosh Anitha Boominathan

Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation

We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment into a Retrieval-Augmented Generation (RAG) framework. Unlike conventional retrieval approaches that…

Artificial Intelligence · Computer Science 2024-12-09 Manish Bhattarai , Minh Vu , Javier E. Santos , Ismael Boureima , Daniel O' Malley

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions and achieve satisfactory performance at the sentence level. However, when applied to document-level…

Computation and Language · Computer Science 2024-01-17 Yachao Li , Junhui Li , Jing Jiang , Min Zhang

Adapting Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses…

Computation and Language · Computer Science 2024-10-14 Minghao Wu , Thuy-Trang Vu , Lizhen Qu , George Foster , Gholamreza Haffari

Model-Driven Quantum Code Generation Using Large Language Models and Retrieval-Augmented Generation

This paper introduces a novel research direction for model-to-text/code transformations by leveraging Large Language Models (LLMs) that can be enhanced with Retrieval-Augmented Generation (RAG) pipelines. The focus is on quantum and hybrid…

Software Engineering · Computer Science 2025-12-03 Nazanin Siavash , Armin Moin

Tutoring LLM into a Better CUDA Optimizer

Recent leaps in large language models (LLMs) caused a revolution in programming tools (like GitHub Copilot) that can help with code generation, debugging, and even performance optimization. In this paper, we focus on the capabilities of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Matyáš Brabec , Jiří Klepl , Michal Töpfer , Martin Kruliš

How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

Decoder-only LLMs have shown impressive performance in MT due to their ability to learn from extensive datasets and generate high-quality translations. However, LLMs often struggle with the nuances and style required for…

Computation and Language · Computer Science 2024-09-11 Inacio Vieira , Will Allred , Séamus Lankford , Sheila Castilho , Andy Way

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

The rapid growth of deep learning has driven exponential increases in model parameters and computational demands. NVIDIA GPUs and their CUDA-based software ecosystem provide robust support for parallel computing, significantly alleviating…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-08 Jiaqi Lv , Xufeng He , Yanchen Liu , Xu Dai , Aocheng Shen , Yinghao Li , Jiachen Hao , Jianrong Ding , Yang Hu , Shouyi Yin

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena…

Computation and Language · Computer Science 2025-08-29 Miguel Moura Ramos , Patrick Fernandes , Sweta Agrawal , André F. T. Martins