English
Related papers

Related papers: PlotGen-Bench: Evaluating VLMs on Generating Visua…

200 papers

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Scientific data visualization is pivotal for transforming raw data into comprehensible visual representations, enabling pattern recognition, forecasting, and the presentation of data-driven insights. However, novice users often face…

Computation and Language · Computer Science 2025-02-04 Kanika Goswami , Puneet Mathur , Ryan Rossi , Franck Dernoncourt

Despite the rapid advancements in Vision-Language Models (VLMs), a critical gap remains in their ability to handle structured, controllable diagrammatic tasks essential for professional workflows. Existing methods predominantly rely on…

Computation and Language · Computer Science 2026-05-18 Xiaoyan Su , Peijie Dong , Zhenheng Tang , Song Tang , Yuyao Zhai , Kaitao Lin , Liang Chen , Gai Yuhang , Yuyu Luo , Qiang Wang , Xiaowen Chu

This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models' effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data - such as a…

Software Engineering · Computer Science 2025-02-27 Timur Galimzyanov , Sergey Titov , Yaroslav Golubev , Egor Bogomolov

The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not…

Computation and Language · Computer Science 2024-05-14 Chengyue Wu , Yixiao Ge , Qiushan Guo , Jiahao Wang , Zhixuan Liang , Zeyu Lu , Ying Shan , Ping Luo

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing…

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Bocheng Zou , Mu Cai , Jianrui Zhang , Yong Jae Lee

Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Qing'an Liu , Juntong Feng , Yuhao Wang , Xinzhe Han , Yujie Cheng , Yue Zhu , Haiwen Diao , Yunzhi Zhuge , Huchuan Lu

Recent Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation. However, their ability to create complex visualizations for scaled and structured data remains largely unevaluated and underdeveloped. To…

Computation and Language · Computer Science 2026-01-16 Jiajun Zhang , Jianke Zhang , Zeyu Cui , Jiaxi Yang , Lei Zhang , Binyuan Hui , Qiang Liu , Zilei Wang , Liang Wang , Junyang Lin

Generative models have received a lot of attention in many areas of academia and the industry. Their capabilities span many areas, from the invention of images given a prompt to the generation of concrete code to solve a certain programming…

Human-Computer Interaction · Computer Science 2024-03-12 Pere-Pau Vázquez

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills…

Computation and Language · Computer Science 2025-02-18 Hanbin Wang , Xiaoxuan Zhou , Zhipeng Xu , Keyuan Cheng , Yuxin Zuo , Kai Tian , Jingwei Song , Junting Lu , Wenhui Hu , Xueyang Liu

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Although large visual-language models (LVLMs) have demonstrated strong performance in multimodal tasks, errors may occasionally arise due to biases during the reasoning process. Recently, reward models (RMs) have become increasingly pivotal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Jiacheng Ruan , Wenzhen Yuan , Xian Gao , Ye Guo , Daoxin Zhang , Zhe Xu , Yao Hu , Ting Liu , Yuzhuo Fu

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in automated front-end engineering, e.g., generating UI code from visual designs. However, existing front-end UI code generation benchmarks have the…

Software Engineering · Computer Science 2026-03-17 Jingyu Xiao , Ming Wang , Man Ho Lam , Yuxuan Wan , Junliang Liu , Yintong Huo , Michael R. Lyu

Code large language models have demonstrated remarkable capabilities in programming tasks, yet current benchmarks primarily focus on single modality rather than visual game development. Most existing code-related benchmarks evaluate syntax…

Software Engineering · Computer Science 2025-09-25 Wei Zhang , Jack Yang , Renshuai Tao , Lingzheng Chai , Shawn Guo , Jiajun Wu , Xiaoming Chen , Ganqu Cui , Ning Ding , Xander Xu , Hu Wei , Bowen Zhou

Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Haider Al-Tahan , Quentin Garrido , Randall Balestriero , Diane Bouchacourt , Caner Hazirbas , Mark Ibrahim

Multimodal large language models (MLLMs) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Mingjie Xu , Jinpeng Chen , Yuzhi Zhao , Jason Chun Lok Li , Yue Qiu , Zekang Du , Mengyang Wu , Pingping Zhang , Kun Li , Hongzheng Yang , Wenao Ma , Jiaheng Wei , Qinbin Li , Kangcheng Liu , Wenqiang Lei
‹ Prev 1 2 3 10 Next ›