Related papers: PanGu-Coder: Program Synthesis with Function-Level…

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code…

Computation and Language · Computer Science 2023-07-28 Bo Shen , Jiaxin Zhang , Taihong Chen , Daoguang Zan , Bing Geng , An Fu , Muhan Zeng , Ailun Yu , Jichuan Ji , Jingyang Zhao , Yuenan Guo , Qianxiang Wang

PanGu-$\alpha$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language…

Computation and Language · Computer Science 2021-04-27 Wei Zeng , Xiaozhe Ren , Teng Su , Hui Wang , Yi Liao , Zhiwei Wang , Xin Jiang , ZhenZhang Yang , Kaisheng Wang , Xiaoda Zhang , Chen Li , Ziyan Gong , Yifan Yao , Xinjing Huang , Jun Wang , Jianfeng Yu , Qi Guo , Yue Yu , Yan Zhang , Jin Wang , Hengtao Tao , Dasen Yan , Zexuan Yi , Fang Peng , Fangqing Jiang , Han Zhang , Lingfeng Deng , Yehong Zhang , Zhe Lin , Chao Zhang , Shaojie Zhang , Mingyue Guo , Shanzhi Gu , Gaojun Fan , Yaowei Wang , Xuefeng Jin , Qun Liu , Yonghong Tian

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests,…

Computation and Language · Computer Science 2024-05-21 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors…

Computation and Language · Computer Science 2023-03-21 Xiaozhe Ren , Pingyi Zhou , Xinfan Meng , Xinjing Huang , Yadao Wang , Weichao Wang , Pengfei Li , Xiaoda Zhang , Alexander Podolskiy , Grigory Arshinov , Andrey Bout , Irina Piontkovskaya , Jiansheng Wei , Xin Jiang , Teng Su , Qun Liu , Jun Yao

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder…

Computation and Language · Computer Science 2021-08-19 Shuming Ma , Li Dong , Shaohan Huang , Dongdong Zhang , Alexandre Muzio , Saksham Singhal , Hany Hassan Awadalla , Xia Song , Furu Wei

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain dialogue generation model based on a large pre-trained language model (PLM) PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue models trained…

Computation and Language · Computer Science 2022-07-06 Fei Mi , Yitong Li , Yulong Zeng , Jingyan Zhou , Yasheng Wang , Chuanfei Xu , Lifeng Shang , Xin Jiang , Shiqi Zhao , Qun Liu

Function-constrained Program Synthesis

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

UniCoder: Scaling Code Large Language Model via Universal Code

Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on…

Computation and Language · Computer Science 2024-06-25 Tao Sun , Linzheng Chai , Jian Yang , Yuwei Yin , Hongcheng Guo , Jiaheng Liu , Bing Wang , Liqun Yang , Zhoujun Li

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require…

Software Engineering · Computer Science 2024-12-06 Yun Peng , Akhilesh Deepak Gotmare , Michael Lyu , Caiming Xiong , Silvio Savarese , Doyen Sahoo

Fixing Large Language Models' Specification Misunderstanding for Better Code Generation

Code generation is to automatically generate source code conforming to a given programming specification, which has received extensive attention especially with the development of large language models (LLMs). Due to the inherent difficulty…

Software Engineering · Computer Science 2024-12-20 Zhao Tian , Junjie Chen , Xiangyu Zhang

Planning with Large Language Models for Code Generation

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process. Although the programs they generate achieve high token-matching-based scores, they often fail to…

Machine Learning · Computer Science 2023-03-10 Shun Zhang , Zhenfang Chen , Yikang Shen , Mingyu Ding , Joshua B. Tenenbaum , Chuang Gan

LLM-Assisted Code Cleaning For Training Accurate Code Generators

Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional…

Machine Learning · Computer Science 2023-11-28 Naman Jain , Tianjun Zhang , Wei-Lin Chiang , Joseph E. Gonzalez , Koushik Sen , Ion Stoica

CAT-LM: Training Language Models on Aligned Code And Tests

Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected. Classical test generation tools such as EvoSuite generate behavioral test suites by optimizing for…

Software Engineering · Computer Science 2023-10-04 Nikitha Rao , Kush Jain , Uri Alon , Claire Le Goues , Vincent J. Hellendoorn

Jigsaw: Large Language Models meet Program Synthesis

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text

Java Code Generation consists in generating automatically Java code from a Natural Language Text. This NLP task helps in increasing programmers' productivity by providing them with immediate solutions to the simplest and most repetitive…

Computation and Language · Computer Science 2023-06-13 Jessica López Espejel , Mahaman Sanoussi Yahaya Alassan , El Mehdi Chouham , Walid Dahhane , El Hassane Ettifouri

Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer

Pseudo-code written by natural language is helpful for novice developers' program comprehension. However, writing such pseudo-code is time-consuming and laborious. Motivated by the research advancements of sequence-to-sequence learning and…

Software Engineering · Computer Science 2021-09-22 Guang Yang , Yanlin Zhou , Xiang Chen , Chi Yu

Type-Constrained Code Generation with Language Models

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation

Recently, researchers have proposed many multi-agent frameworks for function-level code generation, which aim to improve software development productivity by automatically generating function-level source code based on task descriptions. A…

Software Engineering · Computer Science 2025-04-08 Yueheng Zhu , Chao Liu , Xuan He , Xiaoxue Ren , Zhongxin Liu , Ruwei Pan , Hongyu Zhang