RECODE: Reasoning Through Code Generation for Visual Question Answering

Junhong Shen; Mu Cai; Bo Hu; Ameet Talwalkar; David A Ross; Cordelia Schmid; Alireza Fathi

RECODE: Reasoning Through Code Generation for Visual Question Answering

Computer Vision and Pattern Recognition 2026-03-11 v2 Artificial Intelligence Machine Learning

Authors: Junhong Shen , Mu Cai , Bo Hu , Ameet Talwalkar , David A Ross , Cordelia Schmid , Alireza Fathi

Abstract

Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering -- the process of reverse-engineering visuals into executable code -- as a new modality for verifiable visual reasoning. Specifically, we propose RECODE, an agentic framework that first generates multiple candidate programs to reproduce the input image. It then uses a critic to select the most faithful reconstruction and iteratively refines the code. This process not only transforms an ambiguous perceptual task into a verifiable, symbolic problem, but also enables precise calculations and logical inferences later on. On various visual reasoning benchmarks such as CharXiv, ChartQA, and Geometry3K, RECODE significantly outperforms methods that do not leverage code or only use code for drawing auxiliary lines or cropping. Our work demonstrates that grounding visual perception in executable code provides a new path toward more accurate and verifiable multimodal reasoning.

Keywords

visual reasoning code generation automated reasoning

Cite

@article{arxiv.2510.13756,
  title  = {RECODE: Reasoning Through Code Generation for Visual Question Answering},
  author = {Junhong Shen and Mu Cai and Bo Hu and Ameet Talwalkar and David A Ross and Cordelia Schmid and Alireza Fathi},
  journal= {arXiv preprint arXiv:2510.13756},
  year   = {2026}
}

Comments

The authors are withdrawing this manuscript temporarily to conduct additional checks of the experimental setup and implementation. We plan to post an updated version after completing these checks

RECODE: Reasoning Through Code Generation for Visual Question Answering

Abstract

Keywords

Cite

Comments

Related papers