Related papers: MonoCoder: Domain-Specific Code Language Model for…

Scope is all you need: Transforming LLMs for HPC Code

With easier access to powerful compute resources, there is a growing trend in the field of AI for software development to develop larger and larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks…

Computation and Language · Computer Science 2023-10-02 Tal Kadosh , Niranjan Hasabnis , Vy A. Vo , Nadav Schneider , Neva Krien , Abdul Wasay , Nesreen Ahmed , Ted Willke , Guy Tamir , Yuval Pinter , Timothy Mattson , Gal Oren

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages

Large Language Model (LLM) based coding tools have been tremendously successful as software development assistants, yet they are often designed for general purpose programming tasks and perform poorly for more specialized domains such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-20 Aman Chaturvedi , Daniel Nichols , Siddharth Singh , Abhinav Bhatele

HPC-Coder: Modeling Parallel Programs using Large Language Models

Parallel programs in high performance computing (HPC) continue to grow in complexity and scale in the exascale era. The diversity in hardware and parallel programming models make developing, optimizing, and maintaining parallel software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Daniel Nichols , Aniruddha Marathe , Harshitha Menon , Todd Gamblin , Abhinav Bhatele

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite significant achievements, they rely on enormous training data to acquire a broad spectrum of open-domain knowledge. Besides, their…

Software Engineering · Computer Science 2025-02-18 Xiaodong Gu , Meng Chen , Yalan Lin , Yuhan Hu , Hongyu Zhang , Chengcheng Wan , Zhao Wei , Yong Xu , Juhong Wang

SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text…

Computation and Language · Computer Science 2024-11-04 Yangruibo Ding , Jinjun Peng , Marcus J. Min , Gail Kaiser , Junfeng Yang , Baishakhi Ray

The Landscape and Challenges of HPC Research and LLMs

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing…

Machine Learning · Computer Science 2024-02-08 Le Chen , Nesreen K. Ahmed , Akash Dutta , Arijit Bhattacharjee , Sixing Yu , Quazi Ishtiaque Mahmud , Waqwoya Abebe , Hung Phan , Aishwarya Sarkar , Branden Butler , Niranjan Hasabnis , Gal Oren , Vy A. Vo , Juan Pablo Munoz , Theodore L. Willke , Tim Mattson , Ali Jannesari

ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only…

Software Engineering · Computer Science 2025-09-30 Minghua He , Yue Chen , Fangkai Yang , Pu Zhao , Wenjie Yin , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

OMPGPT: A Generative Pre-trained Transformer Model for OpenMP

Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama,…

Software Engineering · Computer Science 2024-11-08 Le Chen , Arijit Bhattacharjee , Nesreen Ahmed , Niranjan Hasabnis , Gal Oren , Vy Vo , Ali Jannesari

MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on…

Computation and Language · Computer Science 2022-12-20 Zi Gong , Yinpeng Guo , Pingyi Zhou , Cuiyun Gao , Yasheng Wang , Zenglin Xu

RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive…

Robotics · Computer Science 2024-06-07 Jingyao Li , Pengguang Chen , Sitong Wu , Chuanyang Zheng , Hong Xu , Jiaya Jia

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Large Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that…

Machine Learning · Computer Science 2025-04-01 Jingyao Li , Pengguang Chen , Bin Xia , Hong Xu , Jiaya Jia

Do Large Language Models Understand Performance Optimization?

Large Language Models (LLMs) have emerged as powerful tools for software development tasks such as code completion, translation, and optimization. However, their ability to generate efficient and correct code, particularly in complex…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-19 Bowen Cui , Tejas Ramesh , Oscar Hernandez , Keren Zhou

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches…

Software Engineering · Computer Science 2026-03-30 Yitong Zhang , Chengze Li , Ruize Chen , Guowei Yang , Xiaoran Jia , Yijie Ren , Jia Li

UniPar: A Unified LLM-Based Framework for Parallel and Accelerated Code Translation in HPC

Translating programs between various parallel programming languages is an important problem in the high-performance computing (HPC) community. Existing tools for this problem are either too narrow in scope and/or outdated. Recent explosive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-16 Tomer Bitan , Tal Kadosh , Erel Kaplan , Shira Meiri , Le Chen , Peter Morales , Niranjan Hasabnis , Gal Oren

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation

Code translation tools (transpilers) are developed for automatic source-to-source translation. Although learning-based transpilers have shown impressive enhancement against rule-based counterparts, owing to their task-specific pre-training…

Software Engineering · Computer Science 2024-05-14 Zhen Yang , Fang Liu , Zhongxing Yu , Jacky Wai Keung , Jia Li , Shuo Liu , Yifan Hong , Xiaoxue Ma , Zhi Jin , Ge Li

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this…

Computation and Language · Computer Science 2024-02-15 Yejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran Xu

Showing LLM-Generated Code Selectively Based on Confidence of LLMs

Large Language Models (LLMs) have shown impressive abilities in code generation, but they may generate erroneous programs. Reading a program takes ten times longer than writing it. Showing these erroneous programs to developers will waste…

Software Engineering · Computer Science 2024-10-07 Jia Li , Yuqi Zhu , Yongmin Li , Ge Li , Zhi Jin

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary…

Computation and Language · Computer Science 2025-03-21 Siming Huang , Tianhao Cheng , J. K. Liu , Jiaran Hao , Liuyihan Song , Yang Xu , J. Yang , Jiaheng Liu , Chenchen Zhang , Linzheng Chai , Ruifeng Yuan , Zhaoxiang Zhang , Jie Fu , Qian Liu , Ge Zhang , Zili Wang , Yuan Qi , Yinghui Xu , Wei Chu