Gistify! Codebase-Level Understanding via Runtime Execution

Hyunji Lee; Minseon Kim; Chinmay Singh; Matheus Pereira; Atharv Sonwane; Isadora White; Elias Stengel-Eskin; Mohit Bansal; Zhengyan Shi; Alessandro Sordoni; Marc-Alexandre Côté; Xingdi Yuan; Lucas Caccia

Gistify! Codebase-Level Understanding via Runtime Execution

Computation and Language 2025-10-31 v1 Artificial Intelligence

Authors: Hyunji Lee , Minseon Kim , Chinmay Singh , Matheus Pereira , Atharv Sonwane , Isadora White , Elias Stengel-Eskin , Mohit Bansal , Zhengyan Shi , Alessandro Sordoni , Marc-Alexandre Côté , Xingdi Yuan , Lucas Caccia

View on arXiv ↗ PDF ↗

Abstract

As coding agents are increasingly deployed in large codebases, the need to automatically design challenging, codebase-level evaluation is central. We propose Gistify, a task where a coding LLM must create a single, minimal, self-contained file that can reproduce a specific functionality of a codebase. The coding LLM is given full access to a codebase along with a specific entrypoint (e.g., a python command), and the generated file must replicate the output of the same command ran under the full codebase, while containing only the essential components necessary to execute the provided command. Success on Gistify requires both structural understanding of the codebase, accurate modeling of its execution flow as well as the ability to produce potentially large code patches. Our findings show that current state-of-the-art models struggle to reliably solve Gistify tasks, especially ones with long executions traces.

Keywords

code generation program analysis software refactoring

Cite

@article{arxiv.2510.26790,
  title  = {Gistify! Codebase-Level Understanding via Runtime Execution},
  author = {Hyunji Lee and Minseon Kim and Chinmay Singh and Matheus Pereira and Atharv Sonwane and Isadora White and Elias Stengel-Eskin and Mohit Bansal and Zhengyan Shi and Alessandro Sordoni and Marc-Alexandre Côté and Xingdi Yuan and Lucas Caccia},
  journal= {arXiv preprint arXiv:2510.26790},
  year   = {2025}
}

Gistify! Codebase-Level Understanding via Runtime Execution

Abstract

Keywords

Cite

Related papers