Related papers: Benchmarking PhD-Level Coding in 3D Geometric Comp…

GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation

Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning in vision-language models (VLMs) face limitations,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-01 Yuan Feng , Yue Yang , Xiaohan He , Jiatong Zhao , Jianlong Chen , Zijun Chen , Daocheng Fu , Qi Liu , Renqiu Xia , Bo Zhang , Junchi Yan

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation

Evaluating code generation models for 3D spatial reasoning requires executing generated code in realistic environments and assessing outputs beyond surface-level correctness. We introduce a platform VoxelCode, for analyzing code generation…

Machine Learning · Computer Science 2026-04-06 Yan Zheng , Florian Bordes

GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs

Geometric spatial reasoning forms the foundation of many applications in artificial intelligence, yet the ability of large language models (LLMs) to operate over geometric spatial information expressed in procedural code remains…

Artificial Intelligence · Computer Science 2026-02-11 Shixian Luo , Zezhou Zhu , Yu Yuan , Yuncheng Yang , Lianlei Shan , Yong Wu

BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizing the outer shape of a part, this task involves understanding its 3D structure,…

Artificial Intelligence · Computer Science 2026-05-13 Haozhe Zhang , Kaichen Liu , Miaomiao Chen , Lei Li , Shaojie Yang , Cheng Peng , Hanjie Chen

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

Recovering editable CAD programs from images or 3D observations is central to AI-assisted design, but progress is difficult to measure because existing evaluations are fragmented across datasets, modalities, and metrics. We introduce…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Anna C. Doris , Jacob Thomas Sony , Ghadi Nehme , Era Syla , Amin Heyrani Nobari , Faez Ahmed

GeoBuildBench: A Benchmark for Interactive and Executable Geometry Construction from Natural Language

We introduce GeoBuildBench, a benchmark designed to evaluate whether large language models and multimodal agents can ground informal natural-language plane geometry problems into executable geometric constructions. Unlike existing geometry…

Computation and Language · Computer Science 2026-05-14 Jinwoong Kim , Rui Yang , Huishuai Zhang

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Zhangqi Jiang , Zheng Sun , Xianfang Zeng , Yufeng Yang , Xuanyang Zhang , Yongliang Wu , Wei Cheng , Gang Yu , Xu Yang , Bihan Wen

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills…

Computation and Language · Computer Science 2025-02-18 Hanbin Wang , Xiaoxuan Zhou , Zhipeng Xu , Keyuan Cheng , Yuxin Zuo , Kai Tian , Jingwei Song , Junting Lu , Wenhui Hu , Xueyang Liu

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Text-to-3D (T23D) generation has emerged as a crucial visual generation task, aiming at synthesizing 3D content from textual descriptions. Studies of this task are currently shifting from per-scene T23D, which requires optimization of the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Xiao Cai , Sitong Su , Jingkuan Song , Pengpeng Zeng , Ji Zhang , Qinhong Du , Mengqi Li , Heng Tao Shen , Lianli Gao

Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks

We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Adrienne Deganutti , Elad Hirsch , Haonan Zhu , Jaejung Seol , Purvanshi Mehta

GeoAnalystBench: A GeoAI benchmark for assessing large language models for spatial analysis workflow and code generation

Recent advances in large language models (LLMs) have fueled growing interest in automating geospatial analysis and GIS workflows, yet their actual capabilities remain uncertain. In this work, we call for rigorous evaluation of LLMs on…

Software Engineering · Computer Science 2025-09-09 Qianheng Zhang , Song Gao , Chen Wei , Yibo Zhao , Ying Nie , Ziru Chen , Shijie Chen , Yu Su , Huan Sun

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Haobo Lin , Tianyi Bai , Chen Chen , Jiajun Zhang , Bohan Zeng , Wentao Zhang , Binhang Yuan

GeoR-Bench: Evaluating Geoscience Visual Reasoning

Geoscience intelligence is expected to understand, reason about, and predict earth system changes to support human decision-making in critical domains such as disaster response, climate adaptation and environmental protection. Although…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Yushuo Zheng , Zicheng Zhang , Huiyu Duan , Chunyi Li , Zijian Chen , Ziheng Jia , Yue Shi , Ke Gu , Xiongkuo Min , Guangtao Zhai

GeoCode: Interpretable Shape Programs

The task of crafting procedural programs capable of generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision and graphics. Within the graphics community, generating procedural 3D models has…

Graphics · Computer Science 2025-03-21 Ofek Pearl , Itai Lang , Yuhua Hu , Raymond A. Yeh , Rana Hanocka

Is Geometry Enough for Matching in Visual Localization?

In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. While matching keypoints via visual descriptors makes…

Computer Vision and Pattern Recognition · Computer Science 2022-08-02 Qunjie Zhou , Sérgio Agostinho , Aljosa Osep , Laura Leal-Taixé

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

Editing images using natural language instructions has become a natural and expressive way to modify visual content; yet, evaluating the performance of such models remains challenging. Existing evaluation approaches often rely on image-text…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Yusu Qian , Jiasen Lu , Tsu-Jui Fu , Xinze Wang , Chen Chen , Yinfei Yang , Wenze Hu , Zhe Gan

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing…

Software Engineering · Computer Science 2025-04-09 Jiawei Guo , Ziming Li , Xueling Liu , Kaijing Ma , Tianyu Zheng , Zhouliang Yu , Ding Pan , Yizhi LI , Ruibo Liu , Yue Wang , Shuyue Guo , Xingwei Qu , Xiang Yue , Ge Zhang , Wenhu Chen , Jie Fu

GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams

Evaluating the symbolic reasoning of large language models (LLMs) calls for geometry benchmarks that require multi-step proofs grounded in both text and diagrams. However, existing benchmarks are often limited in scale and rarely provide…

Computation and Language · Computer Science 2026-03-23 Yushun Zhang , Weiping Fu , Zesheng Yang , Bo Zhao , Lingling Zhang , Jian Zhang , Yumeng Fu , Jiaxing Huang , Jun Liu

Generating CAD Code with Vision-Language Models for 3D Designs

Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided…

Machine Learning · Computer Science 2025-03-03 Kamel Alrashedy , Pradyumna Tambwekar , Zulfiqar Zaidi , Megan Langwasser , Wei Xu , Matthew Gombolay

Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution

Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and…

Artificial Intelligence · Computer Science 2026-02-10 Zhenyu Wu , Yanxi Long , Jian Li , Hua Huang