Related papers: From Code to Play: Benchmarking Program Search for…

Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation

In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs…

Computation and Language · Computer Science 2024-09-09 Luis Mayer , Christian Heumann , Matthias Aßenmacher

ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design

Large language models (LLMs) have taken the scientific world by storm, changing the landscape of natural language processing and human-computer interaction. These powerful tools can answer complex questions and, surprisingly, perform…

Artificial Intelligence · Computer Science 2023-11-14 Pier Luca Lanzi , Daniele Loiacono

Codenames as a Benchmark for Large Language Models

In this paper, we propose the use of the popular word-based board game Codenames as a suitable benchmark for evaluating the reasoning capabilities of Large Language Models (LLMs). Codenames presents a highly interesting challenge for…

Artificial Intelligence · Computer Science 2025-04-23 Matthew Stephenson , Matthew Sidji , Benoît Ronval

Code World Models for General Game Playing

Large Language Models (LLMs) reasoning abilities are increasingly being applied to classical board and card games, but the dominant approach -- involving prompting for direct move generation -- has significant drawbacks. It relies on the…

Artificial Intelligence · Computer Science 2025-10-07 Wolfgang Lehrach , Daniel Hennes , Miguel Lazaro-Gredilla , Xinghua Lou , Carter Wendelken , Zun Li , Antoine Dedieu , Jordi Grau-Moya , Marc Lanctot , Atil Iscen , John Schultz , Marcus Chiam , Ian Gemp , Piotr Zielinski , Satinder Singh , Kevin P. Murphy

Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software

Large Language Models (LLMs) can generate code, but can they generate fast code for complex, real-world software systems? In this study, we investigate this question using a dataset of 65 tasks mined from performance-critical open-source…

Software Engineering · Computer Science 2026-04-10 Lirong Yi , Gregory Gay , Philipp Leitner

Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard

We introduce a novel and extensible benchmark for large language models (LLMs) through grid-based games such as Tic-Tac-Toe, Connect Four, and Gomoku. The open-source game simulation code, available on GitHub, allows LLMs to compete and…

Artificial Intelligence · Computer Science 2024-07-12 Oguzhan Topsakal , Colby Jacob Edell , Jackson Bailey Harper

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages

Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have…

Software Engineering · Computer Science 2023-08-10 Alessio Buscemi

Level Generation Through Large Language Models

Large Language Models (LLMs) are powerful tools, capable of leveraging their training on natural language to write stories, generate code, and answer questions. But can they generate functional video game levels? Game levels, with their…

Artificial Intelligence · Computer Science 2023-06-02 Graham Todd , Sam Earle , Muhammad Umair Nasir , Michael Cerny Green , Julian Togelius

Distilling Game Code World Model Generation into Lightweight Large Language Models

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI agents. Recent work on Code World Models (CWMs)…

Artificial Intelligence · Computer Science 2026-05-26 Tyrone Serapio , Arjun Prakash , Haoyang Xu , Kevin Wang , Amy Greenwald

Evaluating LLMs in Open-Source Games

Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages,…

Computer Science and Game Theory · Computer Science 2025-12-02 Swadesh Sistla , Max Kleiman-Weiner

Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms

Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to…

Neural and Evolutionary Computing · Computer Science 2025-03-24 Niki van Stein , Anna V. Kononova , Lars Kotthoff , Thomas Bäck

Language Models Can Teach Themselves to Program Better

Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is…

Machine Learning · Computer Science 2023-04-13 Patrick Haluptzok , Matthew Bowers , Adam Tauman Kalai

AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code

Large language models (LLMs) are used in software development to assist in various tasks, e.g., code generation and code completion, but empirical evaluations of the quality of the results produced by these models focus on correctness and…

Software Engineering · Computer Science 2025-02-05 Lola Solovyeva , Sophie Weidmann , Fernando Castor

Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications

Large Language Models (LLMs) have demonstrated their remarkable capabilities in numerous fields. This survey focuses on how LLMs empower users, regardless of their technical background, to use human languages to automatically generate…

Software Engineering · Computer Science 2025-04-03 Nam Huynh , Beiyu Lin

Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing

This paper examines the reasoning capabilities of Large Language Models (LLMs) from a novel perspective, focusing on their ability to operate within formally specified, rule-governed environments. We evaluate four LLMs (Gemini 2.5 Pro and…

Artificial Intelligence · Computer Science 2026-02-24 Maciej Świechowski , Adam Żychowski , Jacek Mańdziuk

Boardwalk: Towards a Framework for Creating Board Games with LLMs

Implementing board games in code can be a time-consuming task. However, Large Language Models (LLMs) have been proven effective at generating code for domain-specific tasks with simple contextual information. We aim to investigate whether…

Machine Learning · Computer Science 2025-11-10 Álvaro Guglielmin Becker , Gabriel Bauer de Oliveira , Lana Bertoldo Rossato , Anderson Rocha Tavares

Using Games to Learn How Large Language Models Work

While artificial intelligence (AI) technology is becoming increasingly popular, its underlying mechanisms tend to remain opaque to most people. To address this gap, the field of AI literacy aims to develop various resources to teach people…

Computers and Society · Computer Science 2026-03-31 Allison Chen , Isabella Pu

Usando LLMs para Programar Jogos de Tabuleiro e Varia\c{c}\~oes

Creating programs to represent board games can be a time-consuming task. Large Language Models (LLMs) arise as appealing tools to expedite this process, given their capacity to efficiently generate code from simple contextual information.…

Machine Learning · Computer Science 2025-11-10 Álvaro Guglielmin Becker , Lana Bertoldo Rossato , Anderson Rocha Tavares

Game Generation via Large Language Models

Recently, the emergence of large language models (LLMs) has unlocked new opportunities for procedural content generation. However, recent attempts mainly focus on level generation for specific games with defined game rules such as Super…

Artificial Intelligence · Computer Science 2024-05-31 Chengpeng Hu , Yunlong Zhao , Jialin Liu

Examination of Code generated by Large Language Models

Large language models (LLMs), such as ChatGPT and Copilot, are transforming software development by automating code generation and, arguably, enable rapid prototyping, support education, and boost productivity. Therefore, correctness and…

Software Engineering · Computer Science 2024-08-30 Robin Beer , Alexander Feix , Tim Guttzeit , Tamara Muras , Vincent Müller , Maurice Rauscher , Florian Schäffler , Welf Löwe