Related papers: SIMCODE: A Benchmark for Natural Language to ns-3 …

Holistic Evaluation of State-of-the-Art LLMs for Code Generation

This study presents a comprehensive empirical evaluation of six state-of-the-art large language models (LLMs) for code generation, including both general-purpose and code-specialized models. Using a dataset of 944 real-world LeetCode…

Software Engineering · Computer Science 2025-12-23 Le Zhang , Suresh Kothari

QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it…

Computation and Language · Computer Science 2025-11-04 Taku Mikuriya , Tatsuya Ishigaki , Masayuki Kawarada , Shunya Minami , Tadashi Kadowaki , Yohichi Suzuki , Soshun Naito , Shunya Takata , Takumi Kato , Tamotsu Basseda , Reo Yamada , Hiroya Takamura

SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation

We introduce SIMCOPILOT, a benchmark that simulates the role of large language models (LLMs) as interactive, "copilot"-style coding assistants. Targeting both completion (finishing incomplete methods or code blocks) and infill tasks…

Machine Learning · Computer Science 2025-05-29 Mingchao Jiang , Abhinav Jain , Sophia Zorek , Chris Jermaine

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps.…

Computation and Language · Computer Science 2024-07-12 Flavio Petruzzellis , Alberto Testolin , Alessandro Sperduti

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

Grounding Data Science Code Generation with Input-Output Specifications

Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems,…

Machine Learning · Computer Science 2024-03-18 Yeming Wen , Pengcheng Yin , Kensen Shi , Henryk Michalewski , Swarat Chaudhuri , Alex Polozov

SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios

Large language models (LLMs) have been extensively studied for tasks like math competitions, complex coding, and scientific reasoning, yet their ability to accurately represent and simulate physical scenarios via code remains underexplored.…

Machine Learning · Computer Science 2026-02-12 Yanan Wang , Renxi Wang , Yongxin Wang , Xuezhi Liang , Fajri Koto , Timothy Baldwin , Xiaodan Liang , Haonan Li

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition

Natural language-driven no-code development allows users to specify software functionality using natural language (NL) instead of editing source code, promising increased productivity and democratized development. Large language models…

Software Engineering · Computer Science 2025-08-19 Le Deng , Zhonghao Jiang , Jialun Cao , Michael Pradel , Zhongxin Liu

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language…

Computation and Language · Computer Science 2025-02-11 Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation

Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting strategies like Chain of Thought still operate within…

Computation and Language · Computer Science 2026-01-27 Sina Bagheri Nezhad , Yao Li , Ameeta Agrawal

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are…

Computation and Language · Computer Science 2024-10-02 Yulong Chen , Yang Liu , Jianhao Yan , Xuefeng Bai , Ming Zhong , Yinghao Yang , Ziyi Yang , Chenguang Zhu , Yue Zhang

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as…

Computation and Language · Computer Science 2023-10-17 Odhran O'Donoghue , Aleksandar Shtedritski , John Ginger , Ralph Abboud , Ali Essa Ghareeb , Justin Booth , Samuel G Rodriques

SimulBench: Evaluating Language Models with Creative Simulation Tasks

We introduce SimulBench, a benchmark designed to evaluate large language models (LLMs) across a diverse collection of creative simulation scenarios, such as acting as a Linux terminal or playing text games with users. While these simulation…

Computation and Language · Computer Science 2024-09-13 Qi Jia , Xiang Yue , Tianyu Zheng , Jie Huang , Bill Yuchen Lin

Automatically Generating CS Learning Materials with Large Language Models

Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and Codex, now enable software developers to generate code based on a natural language prompt. Within computer science education, researchers are exploring the potential…

Computers and Society · Computer Science 2022-12-13 Stephen MacNeil , Andrew Tran , Juho Leinonen , Paul Denny , Joanne Kim , Arto Hellas , Seth Bernstein , Sami Sarsa

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this…

Software Engineering · Computer Science 2023-11-23 Zachary Englhardt , Richard Li , Dilini Nissanka , Zhihan Zhang , Girish Narayanswamy , Joseph Breda , Xin Liu , Shwetak Patel , Vikram Iyer

VisCoder2: Building Multi-Language Visualization Coding Agents

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are…

Computation and Language · Computer Science 2026-04-14 Tiancheng Hu , Joachim Baumann , Lorenzo Lupo , Nigel Collier , Dirk Hovy , Paul Röttger

Enhancing Network Management Using Code Generated by Large Language Models

Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this…

Networking and Internet Architecture · Computer Science 2023-08-14 Sathiya Kumaran Mani , Yajie Zhou , Kevin Hsieh , Santiago Segarra , Ranveer Chandra , Srikanth Kandula

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

$ $Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure…

Cryptography and Security · Computer Science 2025-09-30 Ran Elgedawy , Porter Dosch , John Sadik , Senjuti Dutta , Anuj Gautam , Konstantinos Georgiou , Farzin Gholamrezae , Fujiao Ji , Kyungchan Lim , Qian Liu , Scott Ruoti