Related papers: LLM-Assisted Code Cleaning For Training Accurate C…

Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning

Recent work targeting large language models (LLMs) for code generation demonstrated that increasing the amount of training data through synthetic code generation often leads to exceptional performance. In this paper we explore data pruning…

Software Engineering · Computer Science 2024-07-09 Yun-Da Tsai , Mingjie Liu , Haoxing Ren

Data-efficient LLM Fine-tuning for Code Generation

Large language models (LLMs) have demonstrated significant potential in code generation tasks. However, there remains a performance gap between open-source and closed-source models. To address this gap, existing approaches typically…

Computation and Language · Computer Science 2025-04-18 Weijie Lv , Xuan Xia , Sheng-Jun Huang

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective…

Computation and Language · Computer Science 2025-06-27 Leitian Tao , Xiang Chen , Tong Yu , Tung Mai , Ryan Rossi , Yixuan Li , Saayan Mitra

Performance-Aligned LLMs for Generating Fast Code

Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others. Causes of poor…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-30 Daniel Nichols , Pranav Polasam , Harshitha Menon , Aniruddha Marathe , Todd Gamblin , Abhinav Bhatele

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource…

Software Engineering · Computer Science 2025-02-03 Alessandro Giagnorio , Alberto Martin-Lopez , Gabriele Bavota

Personality-Guided Code Generation Using Large Language Models

Code generation, the automatic creation of source code from natural language descriptions, has garnered significant attention due to its potential to streamline software development. Inspired by research that links task-personality…

Software Engineering · Computer Science 2025-05-30 Yaoqi Guo , Zhenpeng Chen , Jie M. Zhang , Yang Liu , Yun Ma

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience

Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks (e.g., code completion and code generation). By leveraging huge existing code corpora (e.g., GitHub),…

Software Engineering · Computer Science 2025-01-16 Xin Yin , Chao Ni , Xiaodan Xu , Xinrui Li , Xiaohu Yang

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec,…

Machine Learning · Computer Science 2024-01-17 Tal Ridnik , Dedy Kredo , Itamar Friedman

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works…

Software Engineering · Computer Science 2025-02-27 Tong Ye , Weigang Huang , Xuhong Zhang , Tengfei Ma , Peiyu Liu , Jianwei Yin , Wenhai Wang

SLM Finetuning for Natural Language to Domain Specific Code Generation in Production

Many applications today use large language models for code generation; however, production systems have strict latency requirements that can be difficult to meet with large models. Small language models with a few billion parameters are…

Machine Learning · Computer Science 2026-04-14 Renjini R. Nair , Damian K. Kowalczyk , Marco Gaudesi , Chhaya Methani

Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?

Background: The rise of Large Language Models (LLMs) in software development has opened new possibilities for code generation. Despite the widespread use of this technology, it remains unclear how well LLMs generate code solutions in terms…

Software Engineering · Computer Science 2025-08-04 Alfred Santa Molison , Marcia Moraes , Glaucia Melo , Fabio Santos , Wesley K. G. Assuncao

Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning

Large Language Models (LLMs) have been widely adopted in commercial code completion engines, significantly enhancing coding efficiency and productivity. However, LLMs may generate code with quality issues that violate coding standards and…

Software Engineering · Computer Science 2025-03-20 Yuan Jiang , Yujian Zhang , Liang Lu , Christoph Treude , Xiaohong Su , Shan Huang , Tiantian Wang

Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats

Recent advancements in Large Language Models (LLMs) have significantly improved their capabilities in natural language processing and code synthesis, enabling more complex applications across different fields. This paper explores the…

Cryptography and Security · Computer Science 2024-10-30 Mohammad Setak , Pooria Madani

Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering

Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence that code review could improve the…

Machine Learning · Computer Science 2023-07-25 Rishov Paul , Md. Mohib Hossain , Mohammed Latif Siddiq , Masum Hasan , Anindya Iqbal , Joanna C. S. Santos

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…

Software Engineering · Computer Science 2025-03-21 Pankaj Thorat , Adnan Qidwai , Adrija Dhar , Aishwariya Chakraborty , Anand Eswaran , Hima Patel , Praveen Jayachandran

Large Language Models in Computer Science Education: A Systematic Literature Review

Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks,…

Machine Learning · Computer Science 2024-10-23 Nishat Raihan , Mohammed Latif Siddiq , Joanna C. S. Santos , Marcos Zampieri

Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects

Code review is essential for maintaining software quality but often time-consuming and cognitively demanding, especially in industrial environments. Recent advancements in language models (LMs) have opened new avenues for automating core…

Software Engineering · Computer Science 2025-10-24 Igli Begolli , Meltem Aksoy , Daniel Neider

Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model…

Software Engineering · Computer Science 2026-04-15 Sabiya Banu Masthan Ali , Oussema Kirmani , Aroosa Hameed , Syed Muhammad Danish , Gautam Srivastava

On the Effectiveness of Training Data Optimization for LLM-based Code Generation: An Empirical Study

Large language models (LLMs) have achieved remarkable progress in code generation, largely driven by the availability of high-quality code datasets for effective training. To further improve data quality, numerous training data optimization…

Software Engineering · Computer Science 2026-01-01 Shiqi Kuang , Zhao Tian , Tao Xiao , Dong Wang , Junjie Chen