English
Related papers

Related papers: An Online Reference-Free Evaluation Framework for …

200 papers

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex…

Computer Vision and Pattern Recognition · Computer Science 2024-07-11 Huitong Pan , Qi Zhang , Cornelia Caragea , Eduard Dragut , Longin Jan Latecki

Flowcharts are indispensable tools in software design and business-process analysis, yet current vision-language models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural…

Artificial Intelligence · Computer Science 2025-05-14 Takamitsu Omasa , Ryo Koshihara , Masumi Morishige

Flowcharts are common tools for communicating processes but are often shared as static images that cannot be easily edited or reused. We present Flowchart2Mermaid, a lightweight web system that converts flowchart images into editable…

Artificial Intelligence · Computer Science 2025-12-04 Pritam Deka , Barry Devereux

Face Image Quality Assessment (FIQA) is a crucial control step in biometric pipelines. It ensures only reliable samples are processed to maintain system accuracy. State-of-the-art FIQA methods achieve high utility but typically operate as…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Erdi Sarıtaş , Eren Onaran , Vitomir Štruc , Hazım Kemal Ekenel

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel…

Software Engineering · Computer Science 2025-06-04 Mengliang He , Jiayi Zeng , Yankai Jiang , Wei Zhang , Zeming Liu , Xiaoming Shi , Aimin Zhou

Flowcharts are typically presented as images, driving the trend of using vision-language models (VLMs) for end-to-end flowchart understanding. However, two key challenges arise: (i) Limited controllability--users have minimal influence over…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Junyi Ye , Ankan Dash , Wenpeng Yin , Guiling Wang

Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Zhuangzi Li , Jian Jin , Shilv Cai , Weisi Lin

This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments. We present a curated dataset containing 1,477 manually annotated frames…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Sankalp Nagaonkar , Augustya Sharma , Ashish Choithani , Ashutosh Trivedi

Among the various means to evaluate the quality of video streams, No-Reference (NR) methods have low computation and may be executed on thin clients. Thus, NR algorithms would be perfect candidates in cases of real-time quality assessment,…

Multimedia · Computer Science 2016-04-28 Maria Torres Vega , Decebal Constantin Mocanu , Antonio Liotta

Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing. While typical visual tasks perform well with low-resolution inputs, reading-intensive…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Mor Shpigel Nacson , Aviad Aberdam , Roy Ganz , Elad Ben Avraham , Alona Golts , Yair Kittenplon , Shai Mazor , Ron Litman

Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Linhan Cao , Wei Sun , Weixia Zhang , Xiangyang Zhu , Kaiwei Zhang , Jun Jia , Dandan Zhu , Guangtao Zhai , Xiongkuo Min

Computer programming textbooks and software documentations often contain flowcharts to illustrate the flow of an algorithm or procedure. Modern OCR engines often tag these flowcharts as graphics and ignore them in further processing. In…

Computer Vision and Pattern Recognition · Computer Science 2025-01-30 Shreya Shukla , Prajwal Gatti , Yogesh Kumar , Vikash Yadav , Anand Mishra

Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show promise in the conversion of these flowcharts into machine-readable models for RE activities, yet, when…

Software Engineering · Computer Science 2026-05-27 Zhifei Dou , Shabnam Hassani , Ou Wei

When people query Vision-Language Models (VLMs) but cannot see the accompanying visual context (e.g. for blind and low-vision users), augmenting VLM predictions with natural language explanations can signal which model predictions are…

Computation and Language · Computer Science 2026-04-23 Keyu He , Tejas Srinivasan , Brihi Joshi , Xiang Ren , Jesse Thomason , Swabha Swayamdipta

Quality assessment of videos is crucial for many computer graphics applications, including video games, virtual reality, and augmented reality, where visual performance has a significant impact on user experience. When test videos cannot be…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Sipeng Yang , Jiayu Ji , Qingchuan Zhu , Zhiyao Yang , Xiaogang Jin

Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through…

Computer Vision and Pattern Recognition · Computer Science 2025-12-11 Kassoum Sanogo , Renzo Ardiccioni

Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective…

Image and Video Processing · Electrical Eng. & Systems 2023-09-06 Abrar Majeedi , Babak Naderi , Yasaman Hosseinkashi , Juhee Cho , Ruben Alvarez Martinez , Ross Cutler

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly - particularly in domains such as frontend web development where the solution quality depends on…

Artificial Intelligence · Computer Science 2026-04-08 Hannah Sansford , Derek H. C. Law , Wei Liu , Abhishek Tripathi , Niresh Agarwal , Gerrit J. J. van den Burg
‹ Prev 1 2 3 10 Next ›