English
Related papers

Related papers: Synchromesh: Reliable code generation from pre-tra…

200 papers

Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the…

Artificial Intelligence · Computer Science 2023-10-31 Anirudh Khatry , Sumit Gulwani , Priyanshu Gupta , Vu Le , Ananya Singha , Mukul Singh , Gust Verbruggen

This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, these models can significantly augment…

Computation and Language · Computer Science 2025-11-21 Mihai Nadas , Laura Diosan , Andreea Tomescu

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical…

Machine Learning · Computer Science 2022-11-04 Hung Le , Yue Wang , Akhilesh Deepak Gotmare , Silvio Savarese , Steven C. H. Hoi

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding, which represent the input as a token sequence without explicitly modeling its structure. Some prior works…

Computation and Language · Computer Science 2022-10-27 Da Shen , Xinyun Chen , Chenguang Wang , Koushik Sen , Dawn Song

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model--where training sequences typically contain…

Computation and Language · Computer Science 2024-02-13 Fenia Christopoulou , Guchun Zhang , Gerasimos Lampouras

In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output sequences. We propose to learn representations of the outputs that are specifically…

Machine Learning · Computer Science 2021-08-09 Joey Hong , David Dohan , Rishabh Singh , Charles Sutton , Manzil Zaheer

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and…

Software Engineering · Computer Science 2021-12-07 Naman Jain , Skanda Vaidyanath , Arun Iyer , Nagarajan Natarajan , Suresh Parthasarathy , Sriram Rajamani , Rahul Sharma

Many applications today use large language models for code generation; however, production systems have strict latency requirements that can be difficult to meet with large models. Small language models with a few billion parameters are…

Machine Learning · Computer Science 2026-04-14 Renjini R. Nair , Damian K. Kowalczyk , Marco Gaudesi , Chhaya Methani

Code completion is a prominent application of Large Language Models (LLMs) in software engineering. Due to the near real-time response requirements of this task, base models with small to medium-sized parameters are typically employed,…

Software Engineering · Computer Science 2025-09-18 Dongjun Yu , Xiao Yan , Zhenrui Li , Jipeng Xiao , Haochuan He , Yongda Yu , Hao Zhang , Guoping Rong , Xiaobo Huang

Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete…

Artificial Intelligence · Computer Science 2021-09-08 Kia Rahmani , Mohammad Raza , Sumit Gulwani , Vu Le , Daniel Morris , Arjun Radhakrishna , Gustavo Soares , Ashish Tiwari

We present a novel framework, SoftSRV, that is used to generate targeted synthetic fine-tuning data for improving task-specific model performance. Given a sample from a target distribution, our proposed framework uses a data-driven loss…

Machine Learning · Computer Science 2025-02-06 Giulia DeSalvo , Jean-Fracois Kagy , Lazaros Karydas , Afshin Rostamizadeh , Sanjiv Kumar

Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor…

Computation and Language · Computer Science 2024-04-10 Zifeng Wang , Chun-Liang Li , Vincent Perot , Long T. Le , Jin Miao , Zizhao Zhang , Chen-Yu Lee , Tomas Pfister

Synthetic data has the potential to improve the performance, training efficiency, and privacy of real training examples. Nevertheless, existing approaches for synthetic text generation are mostly heuristics and cannot generate…

Machine Learning · Computer Science 2025-06-10 Dang Nguyen , Zeman Li , Mohammadhossein Bateni , Vahab Mirrokni , Meisam Razaviyayn , Baharan Mirzasoleiman

This work introduces (1) a technique that allows large language models (LLMs) to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation…

Machine Learning · Computer Science 2023-12-05 Patrick Hajali , Ignas Budvytis

Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this…

Computation and Language · Computer Science 2019-09-19 Genta Indra Winata , Andrea Madotto , Chien-Sheng Wu , Pascale Fung

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although…

Machine Learning · Computer Science 2025-05-09 Niels Mündler , Jingxuan He , Hao Wang , Koushik Sen , Dawn Song , Martin Vechev

Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code and understanding higher-level or system-level specifications. This issue is also observed in…

Computation and Language · Computer Science 2023-04-18 Swapnil Sharma , Nikita Anand , Kranthi Kiran G.

We present a framework that takes a concurrent program composed of unsynchronized processes, along with a temporal specification of their global concurrent behaviour, and automatically generates a concurrent program with synchronization…

Logic in Computer Science · Computer Science 2012-07-05 Roopsha Samanta
‹ Prev 1 2 3 10 Next ›