English
Related papers

Related papers: RECODE: Reasoning Through Code Generation for Visu…

200 papers

While Large Language Models (LLMs) excel at algorithmic code generation, they struggle with front-end development, where correctness is judged on rendered pixels and interaction. We present ReLook, an agentic, vision-grounded reinforcement…

Machine Learning · Computer Science 2025-10-14 Yuhang Li , Chenchen Zhang , Ruilin Lv , Ao Liu , Ken Deng , Yuanxing Zhang , Jiaheng Liu , Wiggin Zhou , Bo Zhou

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-05 Kevin Qinghong Lin , Yuhao Zheng , Hangyu Ran , Dantong Zhu , Dongxing Mao , Linjie Li , Philip Torr , Alex Jinpeng Wang

Humans possess the remarkable skill of Visual Perception, the ability to see and understand the seen, helping them make sense of the visual world and, in turn, reason. Multimodal Large Language Models (MLLM) have recently achieved…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Jitesh Jain , Jianwei Yang , Humphrey Shi

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases…

Computer Vision and Pattern Recognition · Computer Science 2017-05-11 Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Judy Hoffman , Li Fei-Fei , C. Lawrence Zitnick , Ross Girshick

When MLLMs fail at Science, Technology, Engineering, and Mathematics (STEM) visual reasoning, a fundamental question arises: is it due to perceptual deficiencies or reasoning limitations? Through systematic scaling analysis that…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Tongkun Guan , Zhibo Yang , Jianqiang Wan , Mingkun Yang , Zhengtao Guo , Zijian Hu , Ruilin Luo , Ruize Chen , Songtao Jiang , Peng Wang , Wei Shen , Junyang Lin , Xiaokang Yang

Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code reasoning}, the ability to reason about the…

Programming Languages · Computer Science 2025-11-04 Jun Gao , Yun Peng , Xiaoxue Ren

Large Vision-Language Models (LVLMs) can reason from image-text inputs and perform well in various multimodal tasks. Despite this success, they are affected by language priors and often produce hallucinations. Hallucinations denote…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Xinrong Chen , Xu Chu , Yingmin Qiu , Hengyuan Zhang , Jing Xiong , Shiyu Tang , Shuai Liu , Shaokang Yang , Cheng Yang , Hayden Kwok-Hay So , Ngai Wong

Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose…

Artificial Intelligence · Computer Science 2026-05-08 Zehao Deng , Tianjie Ju , Zheng Wu , Liangbo He , Jun Lan , Huijia Zhu , Weiqiang Wang , Zhuosheng Zhang

Coding agents powered by large language models (LLMs) have gained traction for automating code generation through iterative problem-solving with minimal human involvement. Despite the emergence of various frameworks, e.g., LangChain,…

Machine Learning · Computer Science 2025-08-19 Junpeng Wang , Yuzhong Chen , Menghai Pan , Chin-Chia Michael Yeh , Mahashweta Das

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static…

Software Engineering · Computer Science 2025-02-11 Cuong Chi Le , Hoang-Chau Truong-Vinh , Huy Nhat Phan , Dung Duy Le , Tien N. Nguyen , Nghi D. Q. Bui

Multimodal Large Language Models (MLLMs) have achieved notable gains in various tasks by incorporating Chain-of-Thought (CoT) reasoning in language spaces. Recent work extends this direction by leveraging external tools for visual editing,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bangzheng Li , Ximeng Sun , Jiang Liu , Ze Wang , Jialian Wu , Xiaodong Yu , Hao Chen , Emad Barsoum , Muhao Chen , Zicheng Liu

Multimodal Large Language Models (MLLMs) have recently been applied to universal multimodal retrieval, where Chain-of-Thought (CoT) reasoning improves candidate reranking. However, existing approaches remain largely language-driven, relying…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Dongyang Chen , Chaoyang Wang , Dezhao Su , Xi Xiao , Zeyu Zhang , Jing Xiong , Qing Li , Yuzhang Shang , Shichao Kan

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately…

Artificial Intelligence · Computer Science 2025-09-23 Jun Ling , Yao Qi , Tao Huang , Shibo Zhou , Yanqin Huang , Jiang Yang , Ziqi Song , Ying Zhou , Yang Yang , Heng Tao Shen , Peng Wang

Data visualizations are central to scientific communication, journalism, and everyday decision-making, yet they are frequently prone to errors that can distort interpretation or mislead audiences. Rule-based visualization linters can flag…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Valentin Bonas , Martin Sinnona , Viviana Siless , Emmanuel Iarussi

Multimodal Large Language Models (MLLMs) exhibit impressive performance across various visual tasks. Subsequent investigations into enhancing their visual reasoning abilities have significantly expanded their performance envelope. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Yang Chen , Yufan Shen , Wenxuan Huang , Sheng Zhou , Qunshu Lin , Xinyu Cai , Zhi Yu , Jiajun Bu , Botian Shi , Yu Qiao

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

Multimodal large language models (MLLMs) that think with images can interactively use tools to reason about visual inputs, but current approaches often rely on a narrow set of tools with limited real-world necessity and scalability. In this…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Zirun Guo , Minjie Hong , Feng Zhang , Kai Jia , Tao Jin

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra
‹ Prev 1 2 3 10 Next ›