Related papers: MultiCoder: Multi-Programming-Lingual Pre-Training…

MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts

Despite LLMs' excellent code creation capabilities, multilingual code generation remains extremely challenging. To address this, we intent to improve the multi-programming-lingual (MultiPL) performance of the base LLMs while retaining the…

Computation and Language · Computer Science 2025-09-09 Qing Wang , Xue Han , Jiahui Wang , Lehao Xing , Qian Hu , Lianlian Zhang , Chao Deng , Junlan Feng

MonoCoder: Domain-Specific Code Language Model for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop large language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the…

Programming Languages · Computer Science 2024-09-23 Tal Kadosh , Niranjan Hasabnis , Vy A. Vo , Nadav Schneider , Neva Krien , Mihai Capota , Abdul Wasay , Nesreen Ahmed , Ted Willke , Guy Tamir , Yuval Pinter , Timothy Mattson , Gal Oren

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software…

Programming Languages · Computer Science 2024-09-24 Federico Cassano , John Gouwar , Francesca Lucchetti , Claire Schlesinger , Anders Freeman , Carolyn Jane Anderson , Molly Q Feldman , Michael Greenberg , Abhinav Jangda , Arjun Guha

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge…

Machine Learning · Computer Science 2022-12-20 Federico Cassano , John Gouwar , Daniel Nguyen , Sydney Nguyen , Luna Phipps-Costin , Donald Pinckney , Ming-Ho Yee , Yangtian Zi , Carolyn Jane Anderson , Molly Q Feldman , Arjun Guha , Michael Greenberg , Abhinav Jangda

Mix-of-Language-Experts Architecture for Multilingual Programming

Large language models (LLMs) have demonstrated impressive capabilities in aiding developers with tasks like code comprehension, generation, and translation. Supporting multilingual programming -- i.e., coding tasks across multiple…

Programming Languages · Computer Science 2025-06-25 Yifan Zong , Yuntian Deng , Pengyu Nie

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we…

Computation and Language · Computer Science 2023-11-16 Tyler A. Chang , Catherine Arnett , Zhuowen Tu , Benjamin K. Bergen

A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks

Evaluating the performance of Code Language Models (CLMs) for software engineering tasks, especially in multilingual and low-resource programming language settings, poses significant challenges. These challenges are primarily due to the…

Software Engineering · Computer Science 2024-11-26 Rohit Dandamudi , Gema Rodríguez-Pérez

On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written…

Programming Languages · Computer Science 2022-04-21 Fuxiang Chen , Fatemeh Fard , David Lo , Timofey Bryksin

McEval: Massively Multilingual Code Evaluation

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Programming Languages · Computer Science 2024-06-12 Linzheng Chai , Shukai Liu , Jian Yang , Yuwei Yin , Ke Jin , Jiaheng Liu , Tao Sun , Ge Zhang , Changyu Ren , Hongcheng Guo , Zekun Wang , Boyang Wang , Xianjie Wu , Bing Wang , Tongliang Li , Liqun Yang , Sufeng Duan , Zhoujun Li

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

Scaling Laws for Code: Every Programming Language Matters

Code large language models (Code LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages (PLs) have varying impacts during pre-training…

Computation and Language · Computer Science 2025-12-16 Jian Yang , Shawn Guo , Lin Jing , Wei Zhang , Aishan Liu , Chuan Hao , Zhoujun Li , Wayne Xin Zhao , Xianglong Liu , Weifeng Lv , Bryan Dai

Multi-Programming Language Ensemble for Code Generation in Large Language Model

Large language models (LLMs) have significantly improved code generation, particularly in one-pass code generation. However, most existing approaches focus solely on generating code in a single programming language, overlooking the…

Computation and Language · Computer Science 2024-09-09 Tengfei Xue , Xuefeng Li , Tahir Azim , Roman Smirnov , Jianhui Yu , Arash Sadrieh , Babak Pahlavan

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

Curriculum Learning for Small Code Language Models

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these…

Machine Learning · Computer Science 2024-07-16 Marwa Naïr , Kamel Yamani , Lynda Said Lhadj , Riyadh Baghdadi

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

Large Language Models (LLMs) demonstrate strong proficiency in generating code for high-resource programming languages (HRPLs) like Python but struggle significantly with low-resource programming languages (LRPLs) such as Racket or D. This…

Computation and Language · Computer Science 2024-10-25 Jipeng Zhang , Jianshu Zhang , Yuanzhe Li , Renjie Pi , Rui Pan , Runtao Liu , Ziqiang Zheng , Tong Zhang

MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-tuning

Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models~(LLMs) to unlock state-of-the-art performance. Fine-tuning approaches…

Software Engineering · Computer Science 2025-09-23 Boyang Yang , Haoye Tian , Jiadong Ren , Hongyu Zhang , Jacques Klein , Tegawendé F. Bissyandé , Claire Le Goues , Shunfu Jin

On Pretraining for Project-Level Code Completion

Repository-level pretraining is commonly used to enable large language models for code to leverage codebase-wide context. This enhances their ability to generate accurate and context-aware code completions. In this work, we investigate how…

Software Engineering · Computer Science 2025-10-16 Maksim Sapronov , Evgeniy Glukhov

MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic…

Computation and Language · Computer Science 2024-08-22 Hao Zhou , Zhijun Wang , Shujian Huang , Xin Huang , Xue Han , Junlan Feng , Chao Deng , Weihua Luo , Jiajun Chen

Language Models for Code Completion: A Practical Evaluation

Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public…

Software Engineering · Computer Science 2024-02-27 Maliheh Izadi , Jonathan Katzy , Tim van Dam , Marc Otten , Razvan Mihai Popescu , Arie van Deursen

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of…

Computation and Language · Computer Science 2024-10-29 Jiaheng Liu , Ken Deng , Congnan Liu , Jian Yang , Shukai Liu , He Zhu , Peng Zhao , Linzheng Chai , Yanan Wu , Ke Jin , Ge Zhang , Zekun Wang , Guoan Zhang , Bangyu Xiang , Wenbo Su , Bo Zheng