English
Related papers

Related papers: CodeGen: An Open Large Language Model for Code wit…

200 papers

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we…

Machine Learning · Computer Science 2024-07-11 Qinkai Zheng , Xiao Xia , Xu Zou , Yuxiao Dong , Shan Wang , Yufei Xue , Zihan Wang , Lei Shen , Andi Wang , Yang Li , Teng Su , Zhilin Yang , Jie Tang

Large language models (LLMs) have demonstrated remarkable abilities in representation learning for program synthesis and understanding tasks. The quality of the learned representations appears to be dictated by the neural scaling laws as a…

Machine Learning · Computer Science 2023-07-13 Erik Nijkamp , Hiroaki Hayashi , Caiming Xiong , Silvio Savarese , Yingbo Zhou

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce…

Software Engineering · Computer Science 2024-01-29 Daya Guo , Qihao Zhu , Dejian Yang , Zhenda Xie , Kai Dong , Wentao Zhang , Guanting Chen , Xiao Bi , Y. Wu , Y. K. Li , Fuli Luo , Yingfei Xiong , Wenfeng Liang

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

Program synthesis is the process of generating a computer program following a set of specifications, which can be a high-level description of the problem and/or a set of input-output examples. The synthesis can be modeled as a search…

Neural and Evolutionary Computing · Computer Science 2023-04-07 Matheus Campos Fernandes , Fabrício Olivetti de França , Emilio Francesquini

In 2023, we are using the latest models of GPT-4 to advance program synthesis. The large language models have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a…

Computation and Language · Computer Science 2024-02-26 Daniel Li , Lincoln Murr

Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete…

Artificial Intelligence · Computer Science 2021-09-08 Kia Rahmani , Mohammad Raza , Sumit Gulwani , Vu Le , Daniel Morris , Arjun Radhakrishna , Gustavo Soares , Ashish Tiwari

This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new…

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Synthetic data is a standard component in training large language models, yet systematic comparisons across design dimensions, including rephrasing strategy, generator model, and source data, remain absent. We conduct extensive controlled…

Large Language Models (LLMs) have achieved remarkable success in code generation tasks, powering various applications like code completion, debugging, and programming assistance. However, existing benchmarks such as HumanEval, MBPP, and…

Machine Learning · Computer Science 2025-05-09 Manik Sheokand , Parth Sawant

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Large language models have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address…

Software Engineering · Computer Science 2025-04-08 Merlijn Sevenhuijsen , Khashayar Etemadi , Mattias Nyberg

In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating…

Software Engineering · Computer Science 2023-12-11 Sanyogita Piya , Allison Sullivan

Recently, program synthesis driven by large language models (LLMs) has become increasingly popular. However, program synthesis for machine learning (ML) tasks still poses significant challenges. This paper explores a novel form of program…

Software Engineering · Computer Science 2024-09-10 Jinglue Xu , Jialong Li , Zhen Liu , Nagar Anthel Venkatesh Suryanarayanan , Guoyuan Zhou , Jia Guo , Hitoshi Iba , Kenji Tei

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

We develop an approach to estimate the probability that a program sampled from a large language model is correct. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate…

Software Engineering · Computer Science 2023-10-11 Darren Key , Wen-Ding Li , Kevin Ellis

The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the…

Software Engineering · Computer Science 2024-06-14 Ivan R. Ivanov , Joachim Meyer , Aiden Grossman , William S. Moses , Johannes Doerfert

Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task…

Computation and Language · Computer Science 2023-10-10 Weimin Xiong , Yiwen Guo , Hao Chen

Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level…

Software Engineering · Computer Science 2023-11-21 Vatsal Venkatkrishna , Durga Shree Nagabushanam , Emmanuel Iko-Ojo Simon , Melina Vidoni
‹ Prev 1 2 3 10 Next ›