Related papers: Geo-Code: A Code Framework for Reverse Code Genera…

GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models

Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they…

Computation and Language · Computer Science 2024-10-18 Aditya Sharma , Aman Dalmia , Mehran Kazemi , Amal Zouaq , Christopher J. Pal

GeoCode: Interpretable Shape Programs

The task of crafting procedural programs capable of generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision and graphics. Within the graphics community, generating procedural 3D models has…

Graphics · Computer Science 2025-03-21 Ofek Pearl , Itai Lang , Yuhua Hu , Raymond A. Yeh , Rana Hanocka

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Haobo Lin , Tianyi Bai , Chen Chen , Jiajun Zhang , Bohan Zeng , Wentao Zhang , Binhang Yuan

Reverse Browser: Vector-Image-to-Code Generator

Automating the conversion of user interface design into code (image-to-code or image-to-UI) is an active area of software engineering research. However, the state-of-the-art solutions do not achieve high fidelity to the original design, as…

Software Engineering · Computer Science 2025-09-09 Zoltan Toth-Czifra

RECODE: Reasoning Through Code Generation for Visual Question Answering

Multimodal Large Language Models (MLLMs) struggle with precise reasoning for structured visuals like charts and diagrams, as pixel-based perception lacks a mechanism for verification. To address this, we propose to leverage derendering --…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Junhong Shen , Mu Cai , Bo Hu , Ameet Talwalkar , David A Ross , Cordelia Schmid , Alireza Fathi

Real2Code: Reconstruct Articulated Objects via Code Generation

We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model.…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Zhao Mandi , Yijia Weng , Dominik Bauer , Shuran Song

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding…

Computer Vision and Pattern Recognition · Computer Science 2020-01-13 Yueyu Hu , Shuai Yang , Wenhan Yang , Ling-Yu Duan , Jiaying Liu

Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction

Volume-based indoor scene reconstruction methods offer superior generalization capability and real-time deployment potential. However, existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Mingyang Li , Yimeng Fan , Changsong Liu , Lixue Xu , Xin Wang , Yanyan Liu , Wei Zhang

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Facilitating the Parametric Definition of Geometric Properties in Programming-Based CAD

Parametric Computer-aided design (CAD) enables the creation of reusable models by integrating variables into geometric properties, facilitating customization without a complete redesign. However, creating parametric designs in…

Human-Computer Interaction · Computer Science 2024-08-06 J. Felipe Gonzalez , Thomas Pietrzak , Audrey Girouard , Géry Casiez

InCoder: A Generative Model for Code Infilling and Synthesis

Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via…

Software Engineering · Computer Science 2023-04-11 Daniel Fried , Armen Aghajanyan , Jessy Lin , Sida Wang , Eric Wallace , Freda Shi , Ruiqi Zhong , Wen-tau Yih , Luke Zettlemoyer , Mike Lewis

Learning Geometry-Dependent and Physics-Based Inverse Image Reconstruction

Deep neural networks have shown great potential in image reconstruction problems in Euclidean space. However, many reconstruction problems involve imaging physics that are dependent on the underlying non-Euclidean geometry. In this paper,…

Image and Video Processing · Electrical Eng. & Systems 2022-10-07 Xiajun Jiang , Sandesh Ghimire , Jwala Dhamala , Zhiyuan Li , Prashnna Kumar Gyawali , Linwei Wang

GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving

Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry problems…

Computation and Language · Computer Science 2026-05-26 Yingji Zhang , Yong Dai , André Freitas

GA-VisAgent: A Multi-Agent application for code generation and visualization in interactive learning

Geometric Algebra (GA) presents challenges to learners due to its highly abstract mathematical structure and complex operational rules, as translating algebraic manipulations into concrete geometric interpretations is a non-intuitive…

Machine Learning · Computer Science 2026-05-05 Wang Jian , Zhou Jianbo , Xiong Yuhao , Liu Zhenxia , Luo Wen , Yuan LinWang , Yu ZhaoYuan

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

AI-assisted coding has rapidly reshaped software practice and research workflows, yet today's models still struggle to produce correct code for complex 3D geometric vision. If models could reliably write such code, the research of our…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Wenyi Li , Renkai Luo , Yue Yu , Huan-ang Gao , Mingju Gao , Li Yuan , Chaoyou Fu , Hao Zhao

GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image

Generating editable, parametric CAD models from a single image holds great potential to lower the barriers of industrial concept design. However, current multi-modal large language models (MLLMs) still struggle with accurately inferring 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yinghui Wang , Xinyu Zhang , Peng Du

Re-Thinking Inverse Graphics With Large Language Models

Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Successfully disentangling an image…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Peter Kulits , Haiwen Feng , Weiyang Liu , Victoria Abrevaya , Michael J. Black

Image Processing Using Multi-Code GAN Prior

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by…

Computer Vision and Pattern Recognition · Computer Science 2020-04-01 Jinjin Gu , Yujun Shen , Bolei Zhou