English
Related papers

Related papers: Synthesizing Multimodal Geometry Datasets from Scr…

200 papers

Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they…

Computation and Language · Computer Science 2024-10-18 Aditya Sharma , Aman Dalmia , Mehran Kazemi , Amal Zouaq , Christopher J. Pal

Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry problems…

Computation and Language · Computer Science 2026-05-26 Yingji Zhang , Yong Dai , André Freitas

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

Geometric Problem Solving (GPS) poses a unique challenge for Multimodal Large Language Models (MLLMs), requiring not only the joint interpretation of text and diagrams but also iterative visuospatial reasoning. While existing approaches…

Artificial Intelligence · Computer Science 2026-03-26 Shichao Weng , Zhiqiang Wang , Yuhua Zhou , Rui Lu , Ting Liu , Zhiyang Teng , Xiaozhang Liu , Hanmeng Liu

The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with…

Machine Learning · Computer Science 2025-04-29 Tung D. Vu , Chung Hoang , Truong-Son Hy

Large language models (LLMs) have demonstrated strong reasoning capabilities in text-based mathematical problem solving; however, when adapted to visual reasoning tasks, particularly geometric problem solving, their performance…

Artificial Intelligence · Computer Science 2025-10-28 Nannan Shi , Chuanyu Qin , Shipeng Song , Man Luo

Large Multimodal Models (LMMs) often struggle with geometric reasoning due to visual hallucinations and a lack of mathematically precise Chain-of-Thought (CoT) data. To address this, we propose the GeoSym Engine, an automated and scalable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Jinhao Jing , Zheng Ma , Jinwei Liang , Qiannian Zhao , Shawn Chen , Jing Yang , Por Lip Yee , Prayag Tiwari , Jingjing Bai , Benyou Wang , Lewei Lu , Zhan Su

Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and…

Artificial Intelligence · Computer Science 2026-02-10 Zhenyu Wu , Yanxi Long , Jian Li , Hua Huang

Spatio-temporal reasoning in vision-language models requires visual representations that preserve physical geometry rather than merely semantic appearance. Recent multimodal models incorporate geometric information through structural…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Deshui Miao , Xingsen Huang , Yameng Gu , Xin Li , Haijun Zhang , Ming-Hsuan Yang

We introduce KodCode, a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data across diverse difficulties and domains for training Large Language Models for coding. Existing…

Machine Learning · Computer Science 2025-07-15 Zhangchen Xu , Yang Liu , Yueqin Yin , Mingyuan Zhou , Radha Poovendran

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static…

Software Engineering · Computer Science 2025-02-11 Cuong Chi Le , Hoang-Chau Truong-Vinh , Huy Nhat Phan , Dung Duy Le , Tien N. Nguyen , Nghi D. Q. Bui

Recent advances in Multimodal Large Language Models (MLLMs) have achieved remarkable progress in general domains and demonstrated promise in multimodal mathematical reasoning. However, applying MLLMs to geometry problem solving (GPS)…

Computation and Language · Computer Science 2025-04-18 Yicheng Pan , Zhenrong Zhang , Pengfei Hu , Jiefeng Ma , Jun Du , Jianshu Zhang , Quan Liu , Jianqing Gao , Feng Ma

Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-16 Ke Wang , Junting Pan , Linda Wei , Aojun Zhou , Weikang Shi , Zimu Lu , Han Xiao , Yunqiao Yang , Houxing Ren , Mingjie Zhan , Hongsheng Li

Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there…

Computation and Language · Computer Science 2025-09-16 Fenghua Cheng , Jinxiang Wang , Sen Wang , Zi Huang , Xue Li

Despite their proficiency in general tasks, Multi-modal Large Language Models (MLLMs) struggle with automatic Geometry Problem Solving (GPS), which demands understanding diagrams, interpreting symbols, and performing complex reasoning. This…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Renqiu Xia , Mingsheng Li , Hancheng Ye , Wenjie Wu , Hongbin Zhou , Jiakang Yuan , Tianshuo Peng , Xinyu Cai , Xiangchao Yan , Bin Wang , Conghui He , Botian Shi , Tao Chen , Junchi Yan , Bo Zhang

Evaluating the symbolic reasoning of large language models (LLMs) calls for geometry benchmarks that require multi-step proofs grounded in both text and diagrams. However, existing benchmarks are often limited in scale and rarely provide…

Computation and Language · Computer Science 2026-03-23 Yushun Zhang , Weiping Fu , Zesheng Yang , Bo Zhao , Lingling Zhang , Jian Zhang , Yumeng Fu , Jiaxing Huang , Jun Liu

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Jiayin Sun , Caixia Sun , Boyu Yang , Hailin Li , Xiao Chen , Yi Zhang , Errui Ding , Liang Li , Chao Deng , Junlan Feng

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Wenqi Zhang , Zhenglin Cheng , Yuanyu He , Mengna Wang , Yongliang Shen , Zeqi Tan , Guiyang Hou , Mingqian He , Yanna Ma , Weiming Lu , Yueting Zhuang

The task of crafting procedural programs capable of generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision and graphics. Within the graphics community, generating procedural 3D models has…

Graphics · Computer Science 2025-03-21 Ofek Pearl , Itai Lang , Yuhua Hu , Raymond A. Yeh , Rana Hanocka
‹ Prev 1 2 3 10 Next ›