English
Related papers

Related papers: CodePAD: Sequence-based Code Generation with Pushd…

200 papers

The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such…

Machine Learning · Computer Science 2023-07-21 Parshin Shojaee , Aneesh Jain , Sindhu Tipirneni , Chandan K. Reddy

Code generation models have shown significant potential for programming tasks. However, existing training methods like supervised fine-tuning face key limitations: they do not effectively teach models to prioritize correct over incorrect…

Software Engineering · Computer Science 2025-06-04 Kechi Zhang , Ge Li , Yihong Dong , Jingjing Xu , Jun Zhang , Jing Su , Yongfei Liu , Zhi Jin

Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses…

Software Engineering · Computer Science 2024-07-18 Xiaoli Lian , Shuaisong Wang , Jieping Ma , Fang Liu , Xin Tan , Li Zhang , Lin Shi , Cuiyun Gao

Automated source code refactoring, particularly extract method refactoring, is a crucial and frequently employed technique during software development. Despite its importance and frequent use by practitioners, current automated techniques…

Software Engineering · Computer Science 2024-12-25 Indranil Palit , Tushar Sharma

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

Code generation models have shown significant potential for automating programming tasks. However, the challenge of generating accurate and reliable code persists due to the highly complex and long-reasoning nature of the task. Even…

Software Engineering · Computer Science 2025-06-04 Kechi Zhang , Ge Li , Jia Li , Yihong Dong , Jia Li , Zhi Jin

Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human…

Programming Languages · Computer Science 2025-08-25 Lingxiao Li , Salar Rahili , Yiwei Zhao

Training datasets for semantic parsing are typically small due to the higher expertise required for annotation than most other NLP tasks. As a result, models for this application usually need additional prior knowledge to be built into the…

Computation and Language · Computer Science 2021-06-11 Sajad Norouzi , Keyi Tang , Yanshuai Cao

Adversarial examples are important to test and enhance the robustness of deep code models. As source code is discrete and has to strictly stick to complex grammar and semantics constraints, the adversarial example generation techniques in…

Cryptography and Security · Computer Science 2023-08-22 Zhao Tian , Junjie Chen , Zhi Jin

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical…

Machine Learning · Computer Science 2022-11-04 Hung Le , Yue Wang , Akhilesh Deepak Gotmare , Silvio Savarese , Steven C. H. Hoi

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees…

Software Engineering · Computer Science 2022-02-16 Maosheng Zhong , Gen Liu , Hongwei Li , Jiangling Kuang , Jinshan Zeng , Mingwen Wang

Autoregressive next token prediction language models offer powerful capabilities but face significant challenges in practical deployment due to the high computational and memory costs of inference, particularly during the decoding stage. We…

We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf…

Computation and Language · Computer Science 2024-04-02 Chandra Kiran Reddy Evuru , Sreyan Ghosh , Sonal Kumar , Ramaneswaran S , Utkarsh Tyagi , Dinesh Manocha

The increasing adoption of large language models (LLMs) for code-related tasks has raised concerns about the security of their training datasets. One critical threat is dead code poisoning, where syntactically valid but functionally…

Computation and Language · Computer Science 2025-03-03 Chi-Chien Tsai , Chia-Mu Yu , Ying-Dar Lin , Yu-Sung Wu , Wei-Bin Lee

Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance this accessibility, recent research has…

Computation and Language · Computer Science 2025-05-26 Deokhyung Kang , Jeonghun Cho , Yejin Jeon , Sunbin Jang , Minsub Lee , Jawoon Cho , Gary Geunbae Lee

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is…

Recently deep learning based Natural Language Processing (NLP) models have shown great potential in the modeling of source code. However, a major limitation of these approaches is that they take source code as simple tokens of text and…

Neural and Evolutionary Computing · Computer Science 2020-07-15 Yasir Hussain , Zhiqiu Huang , Yu Zhou , Senzhang Wang

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation,…

Software Engineering · Computer Science 2024-08-27 Ali Mohammadi Esfahani , Nafiseh Kahani , Samuel A. Ajila

We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation…

Computation and Language · Computer Science 2019-06-14 Ryan Y. Benmalek , Madian Khabsa , Suma Desu , Claire Cardie , Michele Banko

Code generation stands as a powerful technique in modern software development, improving development efficiency, reducing errors, and fostering standardization and consistency. Recently, ChatGPT has exhibited immense potential in automatic…

Software Engineering · Computer Science 2023-12-22 Youjia Li , Jianjun Shi , Zheng Zhang
‹ Prev 1 2 3 10 Next ›