Related papers: Synthesizing Multimodal Geometry Datasets from Scr…

GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models

Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they…

Computation and Language · Computer Science 2024-10-18 Aditya Sharma , Aman Dalmia , Mehran Kazemi , Amal Zouaq , Christopher J. Pal

GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving

Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry problems…

Computation and Language · Computer Science 2026-05-26 Yingji Zhang , Yong Dai , André Freitas

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation

Geometric Problem Solving (GPS) poses a unique challenge for Multimodal Large Language Models (MLLMs), requiring not only the joint interpretation of text and diagrams but also iterative visuospatial reasoning. While existing approaches…

Artificial Intelligence · Computer Science 2026-03-26 Shichao Weng , Zhiqiang Wang , Yuhua Zhou , Rui Lu , Ting Liu , Zhiyang Teng , Xiaozhang Liu , Hanmeng Liu

Multimodal graph representation learning for website generation based on visual sketch

The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with…

Machine Learning · Computer Science 2025-04-29 Tung D. Vu , Chung Hoang , Truong-Son Hy

GeoThought: A Dataset for Enhancing Mathematical Geometry Reasoning in Vision-Language Models

Large language models (LLMs) have demonstrated strong reasoning capabilities in text-based mathematical problem solving; however, when adapted to visual reasoning tasks, particularly geometric problem solving, their performance…

Artificial Intelligence · Computer Science 2025-10-28 Nannan Shi , Chuanyu Qin , Shipeng Song , Man Luo

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Large Multimodal Models (LMMs) often struggle with geometric reasoning due to visual hallucinations and a lack of mathematically precise Chain-of-Thought (CoT) data. To address this, we propose the GeoSym Engine, an automated and scalable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Jinhao Jing , Zheng Ma , Jinwei Liang , Qiannian Zhao , Shawn Chen , Jing Yang , Por Lip Yee , Prayag Tiwari , Jingjing Bai , Benyou Wang , Lewei Lu , Zhan Su

Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution

Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and…

Artificial Intelligence · Computer Science 2026-02-10 Zhenyu Wu , Yanxi Long , Jian Li , Hua Huang

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

Spatio-temporal reasoning in vision-language models requires visual representations that preserve physical geometry rather than merely semantic appearance. Recent multimodal models incorporate geometric information through structural…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Deshui Miao , Xingsen Huang , Yameng Gu , Xin Li , Haijun Zhang , Ming-Hsuan Yang

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…

Machine Learning · Computer Science 2025-07-15 Zhangchen Xu , Yang Liu , Yueqin Yin , Mingyuan Zhou , Radha Poovendran

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static…

Software Engineering · Computer Science 2025-02-11 Cuong Chi Le , Hoang-Chau Truong-Vinh , Huy Nhat Phan , Dung Duy Le , Tien N. Nguyen , Nghi D. Q. Bui

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

Recent advances in Multimodal Large Language Models (MLLMs) have achieved remarkable progress in general domains and demonstrated promise in multimodal mathematical reasoning. However, applying MLLMs to geometry problem solving (GPS)…

Computation and Language · Computer Science 2025-04-18 Yicheng Pan , Zhenrong Zhang , Pengfei Hu , Jiefeng Ma , Jun Du , Jianshu Zhang , Quan Liu , Jianqing Gao , Feng Ma

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-16 Ke Wang , Junting Pan , Linda Wei , Aojun Zhou , Weikang Shi , Zimu Lu , Han Xiao , Yunqiao Yang , Houxing Ren , Mingjie Zhan , Hongsheng Li

GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View

Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there…

Computation and Language · Computer Science 2025-09-16 Fenghua Cheng , Jinxiang Wang , Sen Wang , Zi Huang , Xue Li

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Despite their proficiency in general tasks, Multi-modal Large Language Models (MLLMs) struggle with automatic Geometry Problem Solving (GPS), which demands understanding diagrams, interpreting symbols, and performing complex reasoning. This…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Renqiu Xia , Mingsheng Li , Hancheng Ye , Wenjie Wu , Hongbin Zhou , Jiakang Yuan , Tianshuo Peng , Xinyu Cai , Xiangchao Yan , Bin Wang , Conghui He , Botian Shi , Tao Chen , Junchi Yan , Bo Zhang

GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams

Evaluating the symbolic reasoning of large language models (LLMs) calls for geometry benchmarks that require multi-step proofs grounded in both text and diagrams. However, existing benchmarks are often limited in scale and rarely provide…

Computation and Language · Computer Science 2026-03-23 Yushun Zhang , Weiping Fu , Zesheng Yang , Bo Zhao , Lingling Zhang , Jian Zhang , Yumeng Fu , Jiaxing Huang , Jun Liu

GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Jiayin Sun , Caixia Sun , Boyu Yang , Hailin Li , Xiao Chen , Yi Zhang , Errui Ding , Liang Li , Chao Deng , Junlan Feng

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Wenqi Zhang , Zhenglin Cheng , Yuanyu He , Mengna Wang , Yongliang Shen , Zeqi Tan , Guiyang Hou , Mingqian He , Yanna Ma , Weiming Lu , Yueting Zhuang

GeoCode: Interpretable Shape Programs

The task of crafting procedural programs capable of generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision and graphics. Within the graphics community, generating procedural 3D models has…

Graphics · Computer Science 2025-03-21 Ofek Pearl , Itai Lang , Yuhua Hu , Raymond A. Yeh , Rana Hanocka