Related papers: EpiCoder: Encompassing Diversity and Complexity in…

EgoCoder: Intelligent Program Synthesis with Hierarchical Sequential Neural Network Model

Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a long-time systematic training is usually required for beginners. According to a…

Artificial Intelligence · Computer Science 2018-05-23 Jiawei Zhang , Limeng Cui , Fisher B. Gouza

Autoencoders as Tools for Program Synthesis

Recently there have been many advances in research on language modeling of source code. Applications range from code suggestion and completion to code summarization. However, complete program synthesis of industry-grade programming…

Artificial Intelligence · Computer Science 2021-09-07 Sander de Bruin , Vadim Liventsev , Milan Petković

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to…

Computation and Language · Computer Science 2018-08-31 Shirley Anugrah Hayati , Raphael Olivier , Pravalika Avvaru , Pengcheng Yin , Anthony Tomasic , Graham Neubig

AnCoder: Anchored Code Generation via Discrete Diffusion Models

Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of…

Machine Learning · Computer Science 2026-02-23 Anton Xue , Litu Rout , Constantine Caramanis , Sanjay Shakkottai

UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation

Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a…

Computation and Language · Computer Science 2025-02-27 Liangying Shao , Yanfu Yan , Denys Poshyvanyk , Jinsong Su

PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with…

Machine Learning · Computer Science 2025-07-29 Tianhao Wang , Simon Klancher , Kunal Mukherjee , Josh Wiedemeier , Feng Chen , Murat Kantarcioglu , Kangkook Jee

IntelliCode Compose: Code Generation Using Transformer

In software development through integrated development environments (IDEs), code completion is one of the most widely used features. Nevertheless, majority of integrated development environments only support completion of methods and APIs,…

Computation and Language · Computer Science 2020-11-02 Alexey Svyatkovskiy , Shao Kun Deng , Shengyu Fu , Neel Sundaresan

InCoder: A Generative Model for Code Infilling and Synthesis

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via…

Software Engineering · Computer Science 2023-04-11 Daniel Fried , Armen Aghajanyan , Jessy Lin , Sida Wang , Eric Wallace , Freda Shi , Ruiqi Zhong , Wen-tau Yih , Luke Zettlemoyer , Mike Lewis

Barcode Method for Generative Model Evaluation driven by Topological Data Analysis

Evaluating the performance of generative models in image synthesis is a challenging task. Although the Fr\'echet Inception Distance is a widely accepted evaluation metric, it integrates different aspects (e.g., fidelity and diversity) of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Ryoungwoo Jang , Minjee Kim , Da-in Eun , Kyungjin Cho , Jiyeon Seo , Namkug Kim

SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion

Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized…

Software Engineering · Computer Science 2025-10-14 Xiaohan Chen , Zhongying Pan , Quan Feng , Yu Tian , Shuqun Yang , Mengru Wang , Lina Gong , Yuxia Geng , Piji Li , Xiang Chen

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

For a complicated algorithm, its implementation by a human programmer usually starts with outlining a rough control flow followed by iterative enrichments, eventually yielding carefully generated syntactic structures and variables in a…

Programming Languages · Computer Science 2023-07-20 Wenqing Zheng , S P Sharan , Ajay Kumar Jaiswal , Kevin Wang , Yihan Xi , Dejia Xu , Zhangyang Wang

Feature Maps: A Comprehensible Software Representation for Design Pattern Detection

Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A…

Software Engineering · Computer Science 2019-03-25 Hannes Thaller , Lukas Linsbauer , Alexander Egyed

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in…

Computation and Language · Computer Science 2023-10-23 Fengji Zhang , Bei Chen , Yue Zhang , Jacky Keung , Jin Liu , Daoguang Zan , Yi Mao , Jian-Guang Lou , Weizhu Chen

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…

Machine Learning · Computer Science 2025-07-15 Zhangchen Xu , Yang Liu , Yueqin Yin , Mingyuan Zhou , Radha Poovendran

Graphcode: Learning from multiparameter persistent homology using graph neural networks

We introduce graphcodes, a novel multi-scale summary of the topological properties of a dataset that is based on the well-established theory of persistent homology. Graphcodes handle datasets that are filtered along two real-valued scale…

Algebraic Topology · Mathematics 2024-05-24 Michael Kerber , Florian Russold

SkCoder: A Sketch-based Approach for Automatic Code Generation

Recently, deep learning techniques have shown great success in automatic code generation. Inspired by the code reuse, some researchers propose copy-based approaches that can copy the content from similar code snippets to obtain better…

Software Engineering · Computer Science 2023-09-08 Jia Li , Yongmin Li , Ge Li , Zhi Jin , Yiyang Hao , Xing Hu

Rethinking complexity for software code structures: A pioneering study on Linux kernel code repository

The recent progress of artificial intelligence(AI) has shown great potentials for alleviating human burden in various complex tasks. From the view of software engineering, AI techniques can be seen in many fundamental aspects of…

Software Engineering · Computer Science 2021-03-02 Wenhe Zhang , Jin He , Kevin Song

SynthFormer: Equivariant Pharmacophore-based Generation of Synthesizable Molecules for Ligand-Based Drug Design

Drug discovery is a complex, resource-intensive process requiring significant time and cost to bring new medicines to patients. Many generative models aim to accelerate drug discovery, but few produce synthetically accessible molecules.…

Machine Learning · Computer Science 2025-01-30 Zygimantas Jocys , Zhanxing Zhu , Henriette M. G. Willems , Katayoun Farrahi