Related papers: An Online Reference-Free Evaluation Framework for …

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Huitong Pan , Qi Zhang , Cornelia Caragea , Eduard Dragut , Longin Jan Latecki

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

Flowcharts are indispensable tools in software design and business-process analysis, yet current vision-language models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural…

Artificial Intelligence · Computer Science 2025-05-14 Takamitsu Omasa , Ryo Koshihara , Masumi Morishige

Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code

Flowcharts are common tools for communicating processes but are often shared as static images that cannot be easily edited or reused. We present Flowchart2Mermaid, a lightweight web system that converts flowchart images into editable…

Artificial Intelligence · Computer Science 2025-12-04 Pritam Deka , Barry Devereux

Employing Vision-Language Models for Face Image Quality Assessment

Face Image Quality Assessment (FIQA) is a crucial control step in biometric pipelines. It ensures only reliable samples are processed to maintain system accuracy. State-of-the-art FIQA methods achieve high utility but typically operate as…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Erdi Sarıtaş , Eren Onaran , Vitomir Štruc , Hazım Kemal Ekenel

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel…

Software Engineering · Computer Science 2025-06-04 Mengliang He , Jiayi Zeng , Yankai Jiang , Wei Zhang , Zeming Liu , Xiaoming Shi , Aimin Zhou

Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding

Flowcharts are typically presented as images, driving the trend of using vision-language models (VLMs) for end-to-end flowchart understanding. However, two key challenges arise: (i) Limited controllability--users have minimal influence over…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Junyi Ye , Ankan Dash , Wenpeng Yin , Guiling Wang

R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment

Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Zhuangzi Li , Jian Jin , Shilv Cai , Weisi Lin

Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments

This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments. We present a curated dataset containing 1,477 manually annotated frames…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Sankalp Nagaonkar , Augustya Sharma , Ashish Choithani , Ashutosh Trivedi

Predictive No-Reference Assessment of Video Quality

Among the various means to evaluate the quality of video streams, No-Reference (NR) methods have low computation and may be executed on thin clients. Thus, NR algorithms would be perfect candidates in cases of real-time quality assessment,…

Multimedia · Computer Science 2016-04-28 Maria Torres Vega , Decebal Constantin Mocanu , Antonio Liotta

DocVLM: Make Your VLM an Efficient Reader

Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing. While typical visual tasks perform well with low-resolution inputs, reading-intensive…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Mor Shpigel Nacson , Aviad Aberdam , Roy Ganz , Elad Ben Avraham , Alona Golts , Yair Kittenplon , Shai Mazor , Ron Litman

QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding

Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Linhan Cao , Wei Sun , Weixia Zhang , Xiangyang Zhu , Kaiwei Zhang , Jun Jia , Dandan Zhu , Guangtao Zhai , Xiongkuo Min

Towards Making Flowchart Images Machine Interpretable

Computer programming textbooks and software documentations often contain flowcharts to illustrate the flow of an algorithm or procedure. Modern OCR engines often tag these flowcharts as graphics and ignore them in further processing. In…

Computer Vision and Pattern Recognition · Computer Science 2025-01-30 Shreya Shukla , Prajwal Gatti , Yogesh Kumar , Vikash Yadav , Anand Mishra

EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering

Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show promise in the conversion of these flowcharts into machine-readable models for RE activities, yet, when…

Software Engineering · Computer Science 2026-05-27 Zhifei Dou , Shabnam Hassani , Ou Wei

Believing without Seeing: Quality Scores for Contextualizing Vision-Language Model Explanations

When people query Vision-Language Models (VLMs) but cannot see the accompanying visual context (e.g. for blind and low-vision users), augmenting VLM predictions with natural language explanations can signal which model predictions are…

Computation and Language · Computer Science 2026-04-23 Keyu He , Tejas Srinivasan , Brihi Joshi , Xiang Ren , Jesse Thomason , Swabha Swayamdipta

No-Reference Rendered Video Quality Assessment: Dataset and Metrics

Quality assessment of videos is crucial for many computer graphics applications, including video games, virtual reality, and augmented reality, where visual performance has a significant impact on user experience. When test videos cannot be…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Sipeng Yang , Jiayu Ji , Qingchuan Zhu , Zhiyao Yang , Xiaogang Jin

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Kassoum Sanogo , Renzo Ardiccioni

Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs

Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective…

Image and Video Processing · Electrical Eng. & Systems 2023-09-06 Abrar Majeedi , Babak Naderi , Yasaman Hosseinkashi , Juhee Cho , Ruben Alvarez Martinez , Ross Cutler

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Computation and Language · Computer Science 2026-03-30 Jiajun Zhang , Yuying Li , Zhixun Li , Xingyu Guo , Jingzhuo Wu , Leqi Zheng , Yiran Yang , Jianke Zhang , Qingbin Li , Shannan Yan , Zhetong Li , Changguo Jia , Junfei Wu , Zilei Wang , Qiang Liu , Liang Wang

Vision-Guided Iterative Refinement for Frontend Code Generation

Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly - particularly in domains such as frontend web development where the solution quality depends on…

Artificial Intelligence · Computer Science 2026-04-08 Hannah Sansford , Derek H. C. Law , Wei Liu , Abhishek Tripathi , Niresh Agarwal , Gerrit J. J. van den Burg