English
Related papers

Related papers: InstructCoder: Instruction Tuning Large Language M…

200 papers

Large Language Models (LLMs) have transformed software development by enabling code generation, automated debugging, and complex reasoning. However, their continued advancement is constrained by the scarcity of high-quality, publicly…

Software Engineering · Computer Science 2025-08-11 Wasi Uddin Ahmad , Aleksander Ficek , Mehrzad Samadi , Jocelyn Huang , Vahid Noroozi , Somshubra Majumdar , Boris Ginsburg

Knowledge editing for large language models can offer an efficient solution to alter a model's behavior without negatively impacting the overall performance. However, the current approaches encounter issues with limited generalizability…

Computation and Language · Computer Science 2024-04-30 Ningyu Zhang , Bozhong Tian , Siyuan Cheng , Xiaozhuan Liang , Yi Hu , Kouying Xue , Yanjie Gou , Xi Chen , Huajun Chen

The evolution of web applications relies on iterative code modifications, a process that is traditionally manual and time-consuming. While Large Language Models (LLMs) can generate UI code, their ability to edit existing code from new…

Software Engineering · Computer Science 2025-10-31 Truong Hai Dang , Jingyu Xiao , Yintong Huo

A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing…

Recent work demonstrates that, after instruction tuning, Code Large Language Models (Code LLMs) can obtain impressive capabilities to address a wide range of code-related tasks. However, current instruction tuning methods for Code LLMs…

Computation and Language · Computer Science 2024-06-10 Zhaojian Yu , Xin Zhang , Ning Shang , Yangyu Huang , Can Xu , Yishujie Zhao , Wenxiang Hu , Qiufeng Yin

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality code data primarily focus on (i) collecting large-scale…

Computation and Language · Computer Science 2025-02-18 Yichuan Ma , Yunfan Shao , Peiji Li , Demin Song , Qipeng Guo , Linyang Li , Xipeng Qiu , Kai Chen

Recent advancements in open-source code large language models (LLMs) have been driven by fine-tuning on the data generated from powerful closed-source LLMs, which are expensive to obtain. This paper explores whether it is possible to use a…

Computation and Language · Computer Science 2024-12-17 Yutong Wu , Di Huang , Wenxuan Shi , Wei Wang , Lingzhe Gao , Shihao Liu , Ziyuan Nan , Kaizhao Yuan , Rui Zhang , Xishan Zhang , Zidong Du , Qi Guo , Yewen Pu , Dawei Yin , Xing Hu , Yunji Chen

Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code…

Instruction tuning is crucial for enabling Large Language Models (LLMs) to solve real-world tasks. Prior work has shown the effectiveness of instruction-tuning data synthesized solely from LLMs, raising a fundamental question: Do we still…

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…

Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this…

Computation and Language · Computer Science 2024-02-15 Yejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran Xu

Large language models (LLMs) are initially pretrained for broad capabilities and then finetuned with instruction-following datasets to improve their performance in interacting with humans. Despite advances in finetuning, a standardized…

Computation and Language · Computer Science 2024-07-30 Yihan Cao , Yanbin Kang , Chi Wang , Lichao Sun

Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor…

Computation and Language · Computer Science 2024-04-10 Zifeng Wang , Chun-Liang Li , Vincent Perot , Long T. Le , Jin Miao , Zizhao Zhang , Chen-Yu Lee , Tomas Pfister

Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs'…

Cryptography and Security · Computer Science 2024-07-15 Jingxuan He , Mark Vero , Gabriela Krasnopolska , Martin Vechev

Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining…

Software Engineering · Computer Science 2026-05-11 Vinaik Chhetri , Moghis Fereidouni , A. B Siddique , Umar Farooq

Code refactoring is a fundamental software engineering practice aimed at improving code quality and maintainability. Despite its importance, developers often neglect refactoring due to the significant time, effort, and resources it…

Large language models (LLMs) have demonstrated outstanding performance in natural language processing tasks. However, in the field of recommender systems, due to the inherent structural discrepancy between user behavior data and natural…

Information Retrieval · Computer Science 2026-01-01 Zekun Liu , Xiaowen Huang , Jitao Sang

Recently, there has been a growing interest in studying how to construct better code instruction tuning data. However, we observe Code models trained with these datasets exhibit high performance on HumanEval but perform worse on other…

Due to limited supervised training data, large language models (LLMs) are typically pre-trained via a self-supervised "predict the next word" objective on a vast amount of unstructured text data. To make the resulting model useful to users,…

Computation and Language · Computer Science 2026-01-30 Ajay Patel , Colin Raffel , Chris Callison-Burch

Instructed code editing, where LLMs directly modify a developer's existing code based on a user instruction, is becoming a widely used interaction mode in AI coding assistants. However, few benchmarks directly evaluate this capability and…

‹ Prev 1 2 3 10 Next ›