Related papers: Improving Tree-Structured Decoder Training for Cod…

UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation

Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a…

Computation and Language · Computer Science 2025-02-27 Liangying Shao , Yanfu Yan , Denys Poshyvanyk , Jinsong Su

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Text-to-Code Generation with Modality-relative Pre-training

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model--where training sequences typically contain…

Computation and Language · Computer Science 2024-02-13 Fenia Christopoulou , Guchun Zhang , Gerasimos Lampouras

Program Language Translation Using a Grammar-Driven Tree-to-Tree Model

The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has…

Machine Learning · Computer Science 2018-07-06 Mehdi Drissi , Olivia Watkins , Aditya Khant , Vivaswat Ojha , Pedro Sandoval , Rakia Segev , Eric Weiner , Robert Keller

Learning to Decode Collaboratively with Multiple Language Models

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the…

Computation and Language · Computer Science 2024-08-28 Shannon Zejiang Shen , Hunter Lang , Bailin Wang , Yoon Kim , David Sontag

StructCoder: Structure-Aware Transformer for Code Generation

There has been a recent surge of interest in automating software engineering tasks using deep learning. This paper addresses the problem of code generation, where the goal is to generate target code given source code in a different language…

Machine Learning · Computer Science 2024-02-01 Sindhu Tipirneni , Ming Zhu , Chandan K. Reddy

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging. The difficulty originates…

Computer Vision and Pattern Recognition · Computer Science 2021-01-28 Yehao Li , Yingwei Pan , Ting Yao , Jingwen Chen , Tao Mei

Inducing Constituency Trees through Neural Machine Translation

Latent tree learning(LTL) methods learn to parse sentences using only indirect supervision from a downstream task. Recent advances in latent tree learning have made it possible to recover moderately high quality tree structures by training…

Computation and Language · Computer Science 2019-09-24 Phu Mon Htut , Kyunghyun Cho , Samuel R. Bowman

Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is…

Computation and Language · Computer Science 2022-10-25 Lorenzo Lupo , Marco Dinarelli , Laurent Besacier

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural…

Information Retrieval · Computer Science 2020-02-26 Wei Ye , Rui Xie , Jinglei Zhang , Tianxiang Hu , Xiaoyin Wang , Shikun Zhang

Neural Text Generation: A Practical Guide

Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural…

Computation and Language · Computer Science 2017-11-28 Ziang Xie

Deep Learning Based Code Generation Methods: Literature Review

This paper focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is requested…

Software Engineering · Computer Science 2024-04-19 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Ge Li , Michael Lyu

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

Code generation aims to generate a code snippet automatically from natural language descriptions. Generally, the mainstream code generation methods rely on a large amount of paired training data, including both the natural language…

Software Engineering · Computer Science 2022-08-24 Sijie Shen , Xiang Zhu , Yihong Dong , Qizhi Guo , Yankun Zhen , Ge Li

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

Mutual Information and Diverse Decoding Improve Neural Machine Translation

Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., $p(y|x)$, an objective that ignores other potentially useful…

Computation and Language · Computer Science 2016-03-24 Jiwei Li , Dan Jurafsky

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred…

Computation and Language · Computer Science 2024-02-06 Dejiao Zhang , Wasi Ahmad , Ming Tan , Hantian Ding , Ramesh Nallapati , Dan Roth , Xiaofei Ma , Bing Xiang

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well to Programming Languages (PL) and largely benefit a broad set of code-related tasks. Despite their success, most current methods…

Computation and Language · Computer Science 2021-09-03 Yue Wang , Weishi Wang , Shafiq Joty , Steven C. H. Hoi

Transformer with Tree-order Encoding for Neural Program Generation

While a considerable amount of semantic parsing approaches have employed RNN architectures for code generation tasks, there have been only few attempts to investigate the applicability of Transformers for this task. Including hierarchical…

Computation and Language · Computer Science 2022-06-28 Klaudia-Doris Thellmann , Bernhard Stadler , Ricardo Usbeck , Jens Lehmann

A Grammar-Based Structural CNN Decoder for Code Generation

Code generation maps a program description to executable source code in a programming language. Existing approaches mainly rely on a recurrent neural network (RNN) as the decoder. However, we find that a program contains significantly more…

Machine Learning · Computer Science 2018-11-19 Zeyu Sun , Qihao Zhu , Lili Mou , Yingfei Xiong , Ge Li , Lu Zhang

An Expression Tree Decoding Strategy for Mathematical Equation Generation

Generating mathematical equations from natural language requires an accurate understanding of the relations among math expressions. Existing approaches can be broadly categorized into token-level and expression-level generation. The former…

Computation and Language · Computer Science 2023-10-19 Wenqi Zhang , Yongliang Shen , Qingpeng Nong , Zeqi Tan , Yanna Ma , Weiming Lu