Related papers: Structural Language Models of Code

A Structural Model for Contextual Code Changes

We address the problem of predicting edit completions based on a learned model that was trained on past edits. Given a code snippet that is partially edited, our goal is to predict a completion of the edit for the rest of the snippet. We…

Programming Languages · Computer Science 2020-10-13 Shaked Brody , Uri Alon , Eran Yahav

Improve Language Modelling for Code Completion through Statement Level Language Model based on Statement Embedding Generated by BiLSTM

Language models such as RNN, LSTM or other variants have been widely used as generative models in natural language processing. In last few years, taking source code as natural languages, parsing source code into a token sequence and using a…

Software Engineering · Computer Science 2019-10-28 Yixiao Yang

Automatic Source Code Summarization with Extended Tree-LSTM

Neural machine translation models are used to automatically generate a document from given source code since this can be regarded as a machine translation task. Source code summarization is one of the components for automatic document…

Machine Learning · Computer Science 2019-06-24 Yusuke Shido , Yasuaki Kobayashi , Akihiro Yamamoto , Atsushi Miyamoto , Tadayuki Matsumura

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks

The recent advancements of Small Language Models (SLMs) have opened new possibilities for efficient code generation. SLMs offer lightweight and cost-effective alternatives to Large Language Models (LLMs), making them attractive for use in…

Software Engineering · Computer Science 2026-01-21 Md Mahade Hasan , Muhammad Waseem , Kai-Kristian Kemell , Jussi Rasku , Juha Ala-Rantala , Pekka Abrahamsson

Bringing Structure to Naturalness: On the Naturalness of ASTs

Source code comes in different shapes and forms. Previous research has already shown code to be more predictable than natural language as well as highlighted its statistical predictability at the token level: source code can be natural.…

Software Engineering · Computer Science 2025-04-14 Profir-Petru Pârţachi , Mahito Sugiyama

A Neural Model for Generating Natural Language Summaries of Program Subroutines

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance.…

Software Engineering · Computer Science 2019-02-07 Alexander LeClair , Siyuan Jiang , Collin McMillan

Structured Language Generation Model: Loss Calibration and Formatted Decoding for Robust Structure Prediction and Knowledge Retrieval

Modern generative pre-trained language models excel at open-ended text generation, yet continue to underperform on structure-related tasks such as NER, relation extraction, and semantic role labeling, especially when compared to…

Computation and Language · Computer Science 2025-12-23 Minho Lee , Junghyun Min , Yerang Kim , Woochul Lee , Yeonsoo Lee

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine…

Machine Learning · Computer Science 2019-02-22 Uri Alon , Shaked Brody , Omer Levy , Eran Yahav

TSLM: Tree-Structured Language Modeling for Divergent Thinking

Language models generate reasoning sequentially, preventing them from decoupling irrelevant exploration paths during search. We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure,…

Computation and Language · Computer Science 2026-02-02 Doyoung Kim , Jaehyeok Doo , Minjoon Seo

Structure Language Models for Protein Conformation Generation

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with…

Biomolecules · Quantitative Biology 2025-03-14 Jiarui Lu , Xiaoyin Chen , Stephen Zhewen Lu , Chence Shi , Hongyu Guo , Yoshua Bengio , Jian Tang

Ain't Nobody Got Time For Coding: Structure-Aware Program Synthesis From Natural Language

Program synthesis from natural language (NL) is practical for humans and, once technically feasible, would significantly facilitate software development and revolutionize end-user programming. We present SAPS, an end-to-end neural network…

Machine Learning · Computer Science 2019-02-19 Jakub Bednarek , Karol Piaskowski , Krzysztof Krawiec

Spiral Language Modeling

In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language…

Computation and Language · Computer Science 2021-12-21 Yong Cao , Yukun Feng , Shaohui Kuang , Gu Xu

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and…

Machine Learning · Computer Science 2026-04-27 Henrijs Princis , Arindam Sharma , Cristina David

Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation

Code translation migrates codebases across programming languages. Recently, large language models (LLMs) have achieved significant advancements in software mining. However, handling the syntactic structure of source code remains a…

Software Engineering · Computer Science 2025-10-14 Yali Du , Hui Sun , Ming Li

Towards Full-line Code Completion with Neural Language Models

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion…

Software Engineering · Computer Science 2020-09-21 Wenhan Wang , Sijie Shen , Ge Li , Zhi Jin

Measuring LLM Code Generation Stability via Structural Entropy

Assessing the stability of code generation from large language models (LLMs) is essential for judging their reliability in real-world development. We extend prior "structural-entropy concepts" to the program domain by pairing entropy with…

Software Engineering · Computer Science 2025-08-21 Yewei Song , Tiezhu Sun , Xunzhu Tang , Prateek Rajput , Tegawende F. Bissyande , Jacques Klein

Structural Code Search using Natural Language Queries

Searching code is a common task that developers perform to understand APIs, learn common code patterns, and navigate code. Currently, developers most commonly search using keywords and regular expressions that are easy to use and widely…

Software Engineering · Computer Science 2025-07-04 Ben Limpanukorn , Yanjun Wang , Zach Patterson , Pranav Garg , Murali Krishna Ramanathan , Xiaofei Ma , Anoop Deoras , Miryung Kim

Function-constrained Program Synthesis

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data

Automated molecular structure elucidation remains challenging, as existing approaches often depend on pre-compiled databases or restrict themselves to single spectroscopic modalities. Here we introduce SpectraLLM, a large language model…

Quantitative Methods · Quantitative Biology 2026-05-12 Yunyue Su , Jiahui Chen , Zao Jiang , Zhenyi Zhong , Liang Wang , Qiang Liu , Zhaoxiang Zhang

Towards Realistic Project-Level Code Generation via Multi-Agent Collaboration and Semantic Architecture Modeling

In recent years, Large Language Models (LLMs) have achieved remarkable progress in automated code generation. In real-world software engineering, the growing demand for rapid iteration and continuous delivery underscores the importance of…

Software Engineering · Computer Science 2025-11-06 Qianhui Zhao , Li Zhang , Fang Liu , Junhang Cheng , Chengru Wu , Junchen Ai , Qiaoyuanhe Meng , Lichen Zhang , Xiaoli Lian , Shubin Song , Yuanping Guo