Related papers: CodeFusion: A Pre-trained Diffusion Model for Code…

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly…

Computation and Language · Computer Science 2021-02-23 Luis Perez , Lizi Ottens , Sudharshan Viswanathan

Constrained Code Generation with Discrete Diffusion

Discrete diffusion models are a powerful, emerging paradigm for code generation. They construct programs through iterative refinement of partially corrupted token sequences and enable parallel token refinement. Importantly, this paradigm…

Computation and Language · Computer Science 2026-05-19 Lize Shao , Michael Cardei , Zichen Xie , Ferdinando Fioretto , Wenxi Wang

CodeDSI: Differentiable Code Search

Reimplementing solutions to previously solved software engineering problems is not only inefficient but also introduces inadequate and error-prone code. Many existing methods achieve impressive performance on this issue by using…

Software Engineering · Computer Science 2022-10-04 Usama Nadeem , Noah Ziems , Shaoen Wu

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning

Code generation aims to automatically generate a piece of code given an input natural language utterance. Currently, among dominant models, it is treated as a sequence-to-tree task, where a decoder outputs a sequence of actions…

Artificial Intelligence · Computer Science 2021-06-01 Binbin Xie , Jinsong Su , Yubin Ge , Xiang Li , Jianwei Cui , Junfeng Yao , Bin Wang

Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation

LLMs have become the mainstream approaches to code generation. Existing LLMs mainly employ autoregressive generation, i.e. generating code token-by-token from left to right. However, the underlying autoregressive generation has two…

Software Engineering · Computer Science 2025-11-04 Chengze Li , Yitong Zhang , Jia Li , Liyi Cai , Ge Li

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

Few-shot learning with large-scale, pre-trained language models is a powerful way to answer questions about code, e.g., how to complete a given code example, or even generate code snippets from scratch. The success of these models raises…

Software Engineering · Computer Science 2022-06-14 Patrick Bareiß , Beatriz Souza , Marcelo d'Amorim , Michael Pradel

AdvFusion: Adapter-based Knowledge Transfer for Code Summarization on Code Language Models

Programming languages can benefit from one another by utilizing a pre-trained model for software engineering tasks such as code summarization and method name prediction. While full fine-tuning of Code Language Models (Code-LMs) has been…

Software Engineering · Computer Science 2024-12-24 Iman Saberi , Amirreza Esmaeili , Fatemeh Fard , Fuxiang Chen

CodeFort: Robust Training for Code Generation Models

Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to…

Software Engineering · Computer Science 2024-10-30 Yuhao Zhang , Shiqi Wang , Haifeng Qian , Zijian Wang , Mingyue Shang , Linbo Liu , Sanjay Krishna Gouda , Baishakhi Ray , Murali Krishna Ramanathan , Xiaofei Ma , Anoop Deoras

Diffusion is a code repair operator and generator

Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete…

Software Engineering · Computer Science 2025-08-18 Mukul Singh , Gust Verbruggen , Vu Le , Sumit Gulwani

Generative Image Coding with Diffusion Prior

As generative technologies advance, visual content has evolved into a complex mix of natural and AI-generated images, driving the need for more efficient coding techniques that prioritize perceptual quality. Traditional codecs and learned…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Jianhui Chang

Breaking the Factorization Barrier in Diffusion Language Models

Diffusion language models theoretically allow for efficient parallel generation but are practically hindered by the "factorization barrier": the assumption that simultaneously predicted tokens are independent. This limitation forces a…

Machine Learning · Computer Science 2026-03-11 Ian Li , Zilei Shao , Benjie Wang , Rose Yu , Guy Van den Broeck , Anji Liu

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees…

Software Engineering · Computer Science 2022-02-16 Maosheng Zhong , Gen Liu , Hongwei Li , Jiangling Kuang , Jinshan Zeng , Mingwen Wang

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space

We train a feed-forward text-to-3D diffusion generator for human characters using only single-view 2D data for supervision. Existing 3D generative models cannot yet match the fidelity of image or video generative models. State-of-the-art 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Souhaib Attaiki , Paul Guerrero , Duygu Ceylan , Niloy J. Mitra , Maks Ovsjanikov

Rectified Diffusion Guidance for Conditional Generation

Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Mengfei Xia , Nan Xue , Yujun Shen , Ran Yi , Tieliang Gong , Yong-Jin Liu

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for…

Computation and Language · Computer Science 2021-06-22 Lun Yiu Nie , Cuiyun Gao , Zhicong Zhong , Wai Lam , Yang Liu , Zenglin Xu

Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models

Programming languages can benefit from one another by utilizing a language model for software engineering tasks. Full fine-tuning and Parameter Efficient Fine-Tuning (PEFT) of Code Language Models (Code-LMs) has been explored for…

Software Engineering · Computer Science 2025-11-06 Amirreza Esmaeili , Fahd Seddik , Yongyi Ji , Fatemeh Fard , Fuxiang Chen