Related papers: DocCGen: Document-based Controlled Code Generation

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering less popular programming languages such as…

Software Engineering · Computer Science 2026-03-06 David Delgado , Lola Burgueño , Robert Clarisó

ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages

Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code…

Software Engineering · Computer Science 2025-03-25 Mehant Kammakomati , Sameer Pimparkhede , Srikanth Tamilselvam , Prince Kumar , Pushpak Bhattacharyya

A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages

Large Language Models (LLMs) have shown impressive capabilities in code generation for popular programming languages. However, their performance on Low-Resource Programming Languages (LRPLs) and Domain-Specific Languages (DSLs) remains a…

Software Engineering · Computer Science 2025-09-29 Sathvik Joel , Jie JW Wu , Fatemeh H. Fard

Evaluating LLM-generated code for domain-specific languages: molecular dynamics with LAMMPS

Large language models (LLMs) are changing the way researchers interact with code and data in scientific computing. While their ability to generate general-purpose code is well established, their effectiveness in producing scientifically…

Software Engineering · Computer Science 2026-05-25 Ethan Holbrook , Juan C. Verduzco , Alejandro Strachan

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite significant achievements, they rely on enormous training data to acquire a broad spectrum of open-domain knowledge. Besides, their…

Software Engineering · Computer Science 2025-02-18 Xiaodong Gu , Meng Chen , Yalan Lin , Yuhan Hu , Hongyu Zhang , Chengcheng Wan , Zhao Wei , Yong Xu , Juhong Wang

CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings

Large language models (LLMs) have revolutionized code generation, automating programming with remarkable efficiency. However, these advancements challenge programming skills, ethics, and assessment integrity, making the detection of…

Computation and Language · Computer Science 2025-07-18 Daniil Orel , Dilshod Azizov , Preslav Nakov

Guiding Large Language Models to Generate Computer-Parsable Content

We propose a method to guide Large Language Models (LLMs) in generating structured content adhering to specific conventions without fine-tuning. By utilizing coroutine-based content generation constraints through a pre-agreed context-free…

Software Engineering · Computer Science 2024-04-23 Jiaye Wang

DocPrompting: Generating Code by Retrieving the Docs

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus,…

Computation and Language · Computer Science 2023-02-21 Shuyan Zhou , Uri Alon , Frank F. Xu , Zhiruo Wang , Zhengbao Jiang , Graham Neubig

Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs

While large language models (LLMs) have been widely applied to code generation, they struggle with generating entire deep learning projects, which are characterized by complex structures, longer functions, and stronger reliance on domain…

Software Engineering · Computer Science 2025-04-22 Chen Xie , Mingsheng Jiao , Xiaodong Gu , Beijun Shen

A Survey on Large Language Models for Code Generation

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This…

Computation and Language · Computer Science 2025-10-28 Juyong Jiang , Fan Wang , Jiasi Shen , Sungju Kim , Sunghun Kim

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges…

Computation and Language · Computer Science 2025-11-18 Yue Huang , Siyuan Wu , Chujie Gao , Dongping Chen , Qihui Zhang , Yao Wan , Tianyi Zhou , Jianfeng Gao , Chaowei Xiao , Lichao Sun , Xiangliang Zhang

Draft-Conditioned Constrained Decoding for Structured Generation in LLMs

Large language models (LLMs) are increasingly used to generate executable outputs, JSON objects, and API calls, where a single syntax error can make the output unusable. Constrained decoding enforces validity token-by-token via masking and…

Computation and Language · Computer Science 2026-03-05 Avinash Reddy , Thayne T. Walker , James S. Ide , Amrit Singh Bedi

S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document

The understanding of large-scale scientific software poses significant challenges due to its diverse codebase, extensive code length, and target computing architectures. The emergence of generative AI, specifically large language models…

Software Engineering · Computer Science 2024-03-19 Kareem Shaik , Dali Wang , Weijian Zheng , Qinglei Cao , Heng Fan , Peter Schwartz , Yunhe Feng

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repository-scale change generation spanning…

Software Engineering · Computer Science 2026-04-28 Sivajeet Chand , Kevin Nguyen , Peter Kuntz , Alexander Pretschner

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces…

Software Engineering · Computer Science 2023-10-26 Jiexin Wang , Liuwen Cao , Xitong Luo , Zhiping Zhou , Jiayuan Xie , Adam Jatowt , Yi Cai

Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their…

Software Engineering · Computer Science 2023-03-29 Giovanni Rosa , Antonio Mastropaolo , Simone Scalabrino , Gabriele Bavota , Rocco Oliveto

Type-Constrained Code Generation with Language Models

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

An optimizing multi-platform source-to-source compiler framework for the NEURON MODeling Language

Domain-specific languages (DSLs) play an increasingly important role in the generation of high performing software. They allow the user to exploit specific knowledge encoded in the constructs for the generation of code adapted to a…

Mathematical Software · Computer Science 2019-05-08 Pramod Kumbhar , Omar Awile , Liam Keegan , Jorge Blanco Alonso , James King , Michael Hines , Felix Schürmann

Exploration of Masked and Causal Language Modelling for Text Generation

Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation,…

Computation and Language · Computer Science 2024-08-12 Nicolo Micheletti , Samuel Belkadi , Lifeng Han , Goran Nenadic

Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks

Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks, yet they face significant limitations in handling complex, long-context programming challenges and demonstrating complex compositional reasoning…

Artificial Intelligence · Computer Science 2025-01-14 Amr Almorsi , Mohanned Ahmed , Walid Gomaa