Related papers: Factorio Learning Environment

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural language is often prevalent. This raises key questions: Do LLMs behave rationally? How do they perform compared…

Computation and Language · Computer Science 2026-03-03 Eilam Shapira , Omer Madmon , Itamar Reinman , Samuel Joseph Amouyal , Roi Reichart , Moshe Tennenholtz

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing…

Machine Learning · Computer Science 2025-05-13 Rushi Qiang , Yuchen Zhuang , Yinghao Li , Dingu Sagar V K , Rongzhi Zhang , Changhao Li , Ian Shu-Hei Wong , Sherry Yang , Percy Liang , Chao Zhang , Bo Dai

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding…

Computation and Language · Computer Science 2024-06-03 Anne Beyer , Kranti Chalamalasetti , Sherzod Hakimov , Brielen Madureira , Philipp Sadler , David Schlangen

Don't Just Fine-tune the Agent, Tune the Environment

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads…

Artificial Intelligence · Computer Science 2026-02-02 Siyuan Lu , Zechuan Wang , Hongxuan Zhang , Qintong Wu , Leilei Gan , Chenyi Zhuang , Jinjie Gu , Tao Lin

LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning

Reinforcement learning (RL) is a promising approach for robotic manipulation, but it can suffer from low sample efficiency and requires extensive exploration of large state-action spaces. Recent methods leverage the commonsense knowledge…

Robotics · Computer Science 2026-04-15 Jelle Luijkx , Runyu Ma , Zlatan Ajanović , Jens Kober

Training-Free Active Learning Framework in Materials Science with Large Language Models

Active learning (AL) accelerates scientific discovery by prioritizing the most informative experiments, but traditional machine learning (ML) models used in AL suffer from cold-start limitations and domain-specific feature engineering,…

Machine Learning · Computer Science 2025-12-05 Hongchen Wang , Rafael Espinosa Castañeda , Jay R. Werber , Yao Fehlis , Edward Kim , Jason Hattrick-Simpers

AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations

The rapid advancement of LLMs sparked significant interest in their potential to augment or automate managerial functions. One of the most recent trends in AI benchmarking is performance of Large Language Models (LLMs) over longer time…

Artificial Intelligence · Computer Science 2025-10-01 Berdymyrat Ovezmyradov

A Survey on Large Language Model-Based Game Agents

Game environments provide rich, controllable settings that stimulate many aspects of real-world complexity. As such, game agents offer a valuable testbed for exploring capabilities relevant to Artificial General Intelligence. Recently, the…

Artificial Intelligence · Computer Science 2025-11-05 Sihao Hu , Tiansheng Huang , Gaowen Liu , Ramana Rao Kompella , Fatih Ilhan , Selim Furkan Tekin , Yichang Xu , Zachary Yahn , Ling Liu

Large Language Model Agent as a Mechanical Designer

Conventional mechanical design follows an iterative process in which initial concepts are refined through cycles of expert assessment and resource-intensive Finite Element Method (FEM) analysis to meet performance goals. While machine…

Machine Learning · Computer Science 2025-05-02 Yayati Jadhav , Amir Barati Farimani

FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models

Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that…

Machine Learning · Computer Science 2023-10-17 Tao Fan , Yan Kang , Guoqiang Ma , Weijing Chen , Wenbin Wei , Lixin Fan , Qiang Yang

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we…

Artificial Intelligence · Computer Science 2024-10-10 Martin Klissarov , Devon Hjelm , Alexander Toshev , Bogdan Mazoure

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling…

Artificial Intelligence · Computer Science 2024-05-28 Zihao Zhou , Bin Hu , Chenyang Zhao , Pu Zhang , Bin Liu

FARE: Fast-Slow Agentic Robotic Exploration

This work advances autonomous robot exploration by integrating agent-level semantic reasoning with fast local control. We introduce FARE, a hierarchical autonomous exploration framework that integrates a large language model (LLM) for…

Robotics · Computer Science 2026-01-22 Shuhao Liao , Xuxin Lv , Jeric Lew , Shizhe Zhang , Jingsong Liang , Peizhuo Li , Yuhong Cao , Wenjun Wu , Guillaume Sartoretti

The Factory Must Grow: Automation in Factorio

Efficient optimization of resources is paramount to success in many problems faced today. In the field of operational research the efficient scheduling of employees; packing of vans; routing of vehicles; logistics of airlines and transport…

Artificial Intelligence · Computer Science 2021-02-10 Kenneth N. Reid , Iliya Miralavy , Stephen Kelly , Wolfgang Banzhaf , Cedric Gondro

PBE Meets LLM: When Few Examples Aren't Few-Shot Enough

Large language models (LLMs) can generate code from natural language descriptions. Their performance is typically evaluated using programming benchmarks that simulate real-world tasks. These benchmarks provide specifications in the form of…

Databases · Computer Science 2025-07-09 Shuning Zhang , Yongjoo Park

Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE

Large Language Models (LLMs) have demonstrated some significant capabilities across various domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study introduces a foundation for a comprehensive…

Software Engineering · Computer Science 2025-06-24 Simon Thorne

Structured In-context Environment Scaling for Large Language Model Reasoning

Large language models (LLMs) have achieved significant advancements in reasoning capabilities through reinforcement learning (RL) via environmental exploration. As the intrinsic properties of the environment determine the abilities that…

Computation and Language · Computer Science 2026-05-04 Peng Yu , Zeyuan Zhao , Shao Zhang , Luoyi Fu , Xinbing Wang , Ying Wen

Learning to Reason in LLMs by Expectation Maximization

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive a reward-based filtered expectation-maximization (FEM) objective for…

Machine Learning · Computer Science 2026-02-03 Junghyun Lee , Branislav Kveton , Anup Rao , Subhojyoti Mukherjee , Ryan A. Rossi , Sunav Choudhary , Alexa Siu

Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations

The improvement of economic policymaking presents an opportunity for broad societal benefit, a notion that has inspired research towards AI-driven policymaking tools. AI policymaking holds the potential to surpass human performance through…

Artificial Intelligence · Computer Science 2024-10-14 Henry Gasztowtt , Benjamin Smith , Vincent Zhu , Qinxun Bai , Edwin Zhang

Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Large language models (LLMs) have demonstrated remarkable capabilities across a range of text-generation tasks. However, LLMs still struggle with problems requiring multi-step decision-making and environmental feedback, such as online…

Artificial Intelligence · Computer Science 2025-02-18 Zhenfang Chen , Delin Chen , Rui Sun , Wenjun Liu , Chuang Gan