Related papers: IRCoder: Intermediate Representations Make Languag…

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance,…

Software Engineering · Computer Science 2024-11-06 Marcos Macedo , Yuan Tian , Pengyu Nie , Filipe R. Cogo , Bram Adams

MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations

One of the primary areas of interest in High Performance Computing is the improvement of performance of parallel workloads. Nowadays, compilable source code-based optimization tasks that employ deep learning often exploit LLVM Intermediate…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-03 Akash Dutta , Ali Jannesari

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces…

Programming Languages · Computer Science 2023-04-25 Marc Szafraniec , Baptiste Roziere , Hugh Leather , Francois Charton , Patrick Labatut , Gabriel Synnaeve

LLM Translation of Compiler Intermediate Representation

GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create…

Programming Languages · Computer Science 2026-05-12 Andrea Valenzuela Ramirez , Cristian Gutierrez-Gomez , Marta Barroso , Dario Garcia-Gasulla , Sara Royuela

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training…

Programming Languages · Computer Science 2024-07-04 Chris Cummins , Volker Seeker , Dejan Grubisic , Baptiste Roziere , Jonas Gehring , Gabriel Synnaeve , Hugh Leather

ComPile: A Large IR Dataset from Production Sources

Code is increasingly becoming a core data modality of modern machine learning research impacting not only the way we write code with conversational agents like OpenAI's ChatGPT, Google's Bard, or Anthropic's Claude, the way we translate…

Programming Languages · Computer Science 2024-05-01 Aiden Grossman , Ludger Paehler , Konstantinos Parasyris , Tal Ben-Nun , Jacob Hegna , William Moses , Jose M Monsalve Diaz , Mircea Trofin , Johannes Doerfert

Can Large Language Models Understand Intermediate Representations in Compilers?

Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study…

Machine Learning · Computer Science 2025-06-06 Hailong Jiang , Jianfeng Zhu , Yao Wan , Bo Fang , Hongyu Zhang , Ruoming Jin , Qiang Guan

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite…

Software Engineering · Computer Science 2024-02-20 Yichen Li , Yun Peng , Yintong Huo , Michael R. Lyu

KnowCoder-X: Boosting Multilingual Information Extraction via Code

Empirical evidence indicates that LLMs exhibit spontaneous cross-lingual alignment. However, although LLMs show promising cross-lingual alignment in Information Extraction (IE), a significant imbalance across languages persists,…

Computation and Language · Computer Science 2025-06-03 Yuxin Zuo , Wenxuan Jiang , Wenxuan Liu , Zixuan Li , Long Bai , Hanbin Wang , Yutao Zeng , Xiaolong Jin , Jiafeng Guo , Xueqi Cheng

MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning

Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been…

Computation and Language · Computer Science 2024-09-27 Zhenlong Dai , Chang Yao , WenKang Han , Ying Yuan , Zhipeng Gao , Jingyuan Chen

Beyond Language Barriers: Multi-Agent Coordination for Multi-Language Code Generation

Producing high-quality code across multiple programming languages is increasingly important as today's software systems are built on heterogeneous stacks. Large language models (LLMs) have advanced the state of automated programming, yet…

Software Engineering · Computer Science 2025-09-25 Micheline Bénédicte Moumoula , Serge Lionel Nikiema , Albérick Euraste Djire , Abdoul Kader Kabore , Jacques Klein , Tegawendé F. Bissyande

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural…

Software Engineering · Computer Science 2022-04-21 Zongjie Li , Pingchuan Ma , Huaijin Wang , Shuai Wang , Qiyi Tang , Sen Nie , Shi Wu

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to…

Computation and Language · Computer Science 2026-04-17 Mengyu Bu , Yang Feng

Multilingual Multimodal Software Developer for Code Generation

The rapid advancement of Large Language Models (LLMs) has significantly improved code generation, yet most models remain text-only, neglecting crucial visual aids like diagrams and flowcharts used in real-world software development. To…

Computation and Language · Computer Science 2025-07-14 Linzheng Chai , Jian Yang , Shukai Liu , Wei Zhang , Liran Wang , Ke Jin , Tao Sun , Congnan Liu , Chenchen Zhang , Hualei Zhu , Jiaheng Liu , Xianjie Wu , Ge Zhang , Tianyu Liu , Zhoujun Li

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Code translation tools (transpilers) are developed for automatic source-to-source translation. Although learning-based transpilers have shown impressive enhancement against rule-based counterparts, owing to their task-specific pre-training…

Software Engineering · Computer Science 2024-05-14 Zhen Yang , Fang Liu , Zhongxing Yu , Jacky Wai Keung , Jia Li , Shuo Liu , Yifan Hong , Xiaoxue Ma , Zhi Jin , Ge Li

Specification-Driven Code Translation Powered by Large Language Models: How Far Are We?

Large Language Models (LLMs) are increasingly being applied across various domains, including code-related tasks such as code translation. Previous studies have explored using LLMs for translating code between different programming…

Software Engineering · Computer Science 2026-05-05 Soumit Kanti Saha , Fazle Rabbi , Song Wang , Jinqiu Yang

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Since the introduction of Large Language Models (LLMs), they have been widely adopted for various tasks such as text summarization, question answering, speech-to-text translation, and more. In recent times, the use of LLMs for code…

Software Engineering · Computer Science 2026-01-22 Krishna Vamshi Bodla , Haizhao Yang

UniCoder: Scaling Code Large Language Model via Universal Code

Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on…

Computation and Language · Computer Science 2024-06-25 Tao Sun , Linzheng Chai , Jian Yang , Yuwei Yin , Hongcheng Guo , Jiaheng Liu , Bing Wang , Liqun Yang , Zhoujun Li

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen