English
Related papers

Related papers: MultiCoder: Multi-Programming-Lingual Pre-Training…

200 papers

Despite LLMs' excellent code creation capabilities, multilingual code generation remains extremely challenging. To address this, we intent to improve the multi-programming-lingual (MultiPL) performance of the base LLMs while retaining the…

Computation and Language · Computer Science 2025-09-09 Qing Wang , Xue Han , Jiahui Wang , Lehao Xing , Qian Hu , Lianlian Zhang , Chao Deng , Junlan Feng

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop large language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the…

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software…

Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge…

Large language models (LLMs) have demonstrated impressive capabilities in aiding developers with tasks like code comprehension, generation, and translation. Supporting multilingual programming -- i.e., coding tasks across multiple…

Programming Languages · Computer Science 2025-06-25 Yifan Zong , Yuntian Deng , Pengyu Nie

Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we…

Computation and Language · Computer Science 2023-11-16 Tyler A. Chang , Catherine Arnett , Zhuowen Tu , Benjamin K. Bergen

Evaluating the performance of Code Language Models (CLMs) for software engineering tasks, especially in multilingual and low-resource programming language settings, poses significant challenges. These challenges are primarily due to the…

Software Engineering · Computer Science 2024-11-26 Rohit Dandamudi , Gema Rodríguez-Pérez

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written…

Programming Languages · Computer Science 2022-04-21 Fuxiang Chen , Fatemeh Fard , David Lo , Timofey Bryksin

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

Code large language models (Code LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages (PLs) have varying impacts during pre-training…

Computation and Language · Computer Science 2025-12-16 Jian Yang , Shawn Guo , Lin Jing , Wei Zhang , Aishan Liu , Chuan Hao , Zhoujun Li , Wayne Xin Zhao , Xianglong Liu , Weifeng Lv , Bryan Dai

Large language models (LLMs) have significantly improved code generation, particularly in one-pass code generation. However, most existing approaches focus solely on generating code in a single programming language, overlooking the…

Computation and Language · Computer Science 2024-09-09 Tengfei Xue , Xuefeng Li , Tahir Azim , Roman Smirnov , Jianhui Yu , Arash Sadrieh , Babak Pahlavan

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these…

Machine Learning · Computer Science 2024-07-16 Marwa Naïr , Kamel Yamani , Lynda Said Lhadj , Riyadh Baghdadi

Large Language Models (LLMs) demonstrate strong proficiency in generating code for high-resource programming languages (HRPLs) like Python but struggle significantly with low-resource programming languages (LRPLs) such as Racket or D. This…

Computation and Language · Computer Science 2024-10-25 Jipeng Zhang , Jianshu Zhang , Yuanzhe Li , Renjie Pi , Rui Pan , Runtao Liu , Ziqiang Zheng , Tong Zhang

Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models~(LLMs) to unlock state-of-the-art performance. Fine-tuning approaches…

Software Engineering · Computer Science 2025-09-23 Boyang Yang , Haoye Tian , Jiadong Ren , Hongyu Zhang , Jacques Klein , Tegawendé F. Bissyandé , Claire Le Goues , Shunfu Jin

Repository-level pretraining is commonly used to enable large language models for code to leverage codebase-wide context. This enhances their ability to generate accurate and context-aware code completions. In this work, we investigate how…

Software Engineering · Computer Science 2025-10-16 Maksim Sapronov , Evgeniy Glukhov

Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic…

Computation and Language · Computer Science 2024-08-22 Hao Zhou , Zhijun Wang , Shujian Huang , Xin Huang , Xue Han , Junlan Feng , Chao Deng , Weihua Luo , Jiajun Chen

Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public…

Software Engineering · Computer Science 2024-02-27 Maliheh Izadi , Jonathan Katzy , Tim van Dam , Marc Otten , Razvan Mihai Popescu , Arie van Deursen

Repository-level code completion has drawn great attention in software engineering, and several benchmark datasets have been introduced. However, existing repository-level code completion benchmarks usually focus on a limited number of…

Computation and Language · Computer Science 2024-10-29 Jiaheng Liu , Ken Deng , Congnan Liu , Jian Yang , Shukai Liu , He Zhu , Peng Zhao , Linzheng Chai , Yanan Wu , Ke Jin , Ge Zhang , Zekun Wang , Guoan Zhang , Bangyu Xiang , Wenbo Su , Bo Zheng
‹ Prev 1 2 3 10 Next ›