Related papers: InterCode: Standardizing and Benchmarking Interact…

Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping

Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task, i.e., generating UI code from UI mock-ups. However, existing benchmarks only contain static web pages for evaluation and ignore…

Software Engineering · Computer Science 2026-03-03 Jingyu Xiao , Yuxuan Wan , Yintong Huo , Zixin Wang , Xinyi Xu , Wenxuan Wang , Zhiyao Xu , Yuhang Wang , Michael R. Lyu

Low-code LLM: Graphical User Interface over Large Language Models

Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It…

Computation and Language · Computer Science 2024-04-02 Yuzhe Cai , Shaoguang Mao , Wenshan Wu , Zehua Wang , Yaobo Liang , Tao Ge , Chenfei Wu , Wang You , Ting Song , Yan Xia , Jonathan Tien , Nan Duan , Furu Wei

InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation

Large Language Models (LLMs) are increasingly capable of generating complete applications from natural language instructions, creating new opportunities in science and education. In these domains, interactive scientific demonstrations are…

Software Engineering · Computer Science 2026-05-21 Qiaosheng Chen , Yang Liu , Lei Li , Kai Chen , Qipeng Guo , Gong Cheng , Fei Yuan

ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments

Large language models (LLMs) have proven invaluable for code generation, particularly in interactive settings. However, existing code generation benchmarks fail to capture the diverse feedback encountered in multi-turn interactions,…

Software Engineering · Computer Science 2025-02-28 Hojae Han , Seung-won Hwang , Rajhans Samdani , Yuxiong He

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding, safety concerns, such as generating or executing risky code, have become significant barriers to the real-world deployment of these agents. To…

Software Engineering · Computer Science 2024-11-13 Chengquan Guo , Xun Liu , Chulin Xie , Andy Zhou , Yi Zeng , Zinan Lin , Dawn Song , Bo Li

DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation

The rapid advancement of large language models (LLMs) has significantly improved their performance in code generation tasks. However, existing code benchmarks remain static, consisting of fixed datasets with predefined problems. This makes…

Computation and Language · Computer Science 2025-05-30 Wenhao Hu , Jinhao Duan , Chunchen Wei , Li Zhang , Yue Zhang , Kaidi Xu

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving

Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests,…

Computation and Language · Computer Science 2024-05-21 Md. Ashraful Islam , Mohammed Eunus Ali , Md Rizwan Parvez

From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting

Large Language Models (LLMs) have shown remarkable potential in code generation, making them increasingly important in the field. However, the security issues of generated code have not been fully addressed, and the usability of LLMs in…

Cryptography and Security · Computer Science 2024-10-21 Shigang Liu , Bushra Sabir , Seung Ick Jang , Yuval Kansal , Yansong Gao , Kristen Moore , Alsharif Abuadbba , Surya Nepal

UniCoder: Scaling Code Large Language Model via Universal Code

Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on…

Computation and Language · Computer Science 2024-06-25 Tao Sun , Linzheng Chai , Jian Yang , Yuwei Yin , Hongcheng Guo , Jiaheng Liu , Bing Wang , Liqun Yang , Zhoujun Li

When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback

Programming is a fundamentally interactive process, yet coding assistants are often evaluated using static benchmarks that fail to measure how well models collaborate with users. We introduce an interactive evaluation pipeline to examine…

Human-Computer Interaction · Computer Science 2025-02-26 Jane Pan , Ryan Shar , Jacob Pfau , Ameet Talwalkar , He He , Valerie Chen

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance,…

Software Engineering · Computer Science 2024-11-06 Marcos Macedo , Yuan Tian , Pengyu Nie , Filipe R. Cogo , Bram Adams

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level…

Machine Learning · Computer Science 2026-01-12 Jiefu Ou , Sapana Chaudhary , Kaj Bostrom , Nathaniel Weir , Shuai Zhang , Huzefa Rangwala , George Karypis

Instruct or Interact? Exploring and Eliciting LLMs' Capability in Code Snippet Adaptation Through Prompt Engineering

Code snippet adaptation is a fundamental activity in the software development process. Unlike code generation, code snippet adaptation is not a "free creation", which requires developers to tailor a given code snippet in order to fit…

Software Engineering · Computer Science 2024-11-26 Tanghaoran Zhang , Yue Yu , Xinjun Mao , Shangwen Wang , Kang Yang , Yao Lu , Zhang Zhang , Yuxin Zhao

RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback

Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the…

Computation and Language · Computer Science 2025-10-27 Chunyu Miao , Henry Peng Zou , Yangning Li , Yankai Chen , Yibo Wang , Fangxin Wang , Yifan Li , Wooseong Yang , Bowei He , Xinni Zhang , Dianzhi Yu , Hanchen Yang , Hoang H Nguyen , Yue Zhou , Jie Yang , Jizhou Guo , Wenzhe Fan , Chin-Yuan Yeh , Panpan Meng , Liancheng Fang , Jinhu Qi , Wei-Chieh Huang , Zhengyao Gu , Yuwei Han , Langzhou He , Yuyao Yang , Yinghui Li , Hai-Tao Zheng , Xue Liu , Irwin King , Philip S. Yu

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking…

Software Engineering · Computer Science 2024-10-04 Sarah Fakhoury , Aaditya Naik , Georgios Sakkas , Saikat Chakraborty , Shuvendu K. Lahiri

Interactive Code Generation via Test-Driven User-Intent Formalization

Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, when interacting with LLMs, users have no guarantees that the…

Software Engineering · Computer Science 2023-10-05 Shuvendu K. Lahiri , Sarah Fakhoury , Aaditya Naik , Georgios Sakkas , Saikat Chakraborty , Madanlal Musuvathi , Piali Choudhury , Curtis von Veh , Jeevana Priya Inala , Chenglong Wang , Jianfeng Gao

ReCode: Updating Code API Knowledge with Reinforcement Learning

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their…

Computation and Language · Computer Science 2025-11-25 Haoze Wu , Yunzhi Yao , Wenhao Yu , Ningyu Zhang

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code…

Software Engineering · Computer Science 2024-02-06 Shihan Dou , Yan Liu , Haoxiang Jia , Limao Xiong , Enyu Zhou , Wei Shen , Junjie Shan , Caishuang Huang , Xiao Wang , Xiaoran Fan , Zhiheng Xi , Yuhao Zhou , Tao Ji , Rui Zheng , Qi Zhang , Xuanjing Huang , Tao Gui

IntentCoding: Amplifying User Intent in Code Generation

Large Language Models (LLMs) have shown strong capabilities in code generation, but their adherence to fine-grained user intent with multiple constraints remains a significant challenge. Our empirical analysis reveals two key observations:…

Software Engineering · Computer Science 2026-02-03 Zheng Fang , Yihong Dong , Lili Mou , Dongming Jin , Zhi Jin , Ge Li

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

Generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present CodeCompose, an AI-assisted code authoring tool…

Software Engineering · Computer Science 2024-02-20 Vijayaraghavan Murali , Chandra Maddila , Imad Ahmad , Michael Bolin , Daniel Cheng , Negar Ghorbani , Renuka Fernandez , Nachiappan Nagappan , Peter C. Rigby