Related papers: Seeing is Improving: Visual Feedback for Iterative…

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-language models (VLMs) have enabled high-quality SVG generation by framing the problem as a code…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Juan A. Rodriguez , Haotian Zhang , Abhay Puri , Aarash Feizi , Rishav Pramanik , Pascal Wichmann , Arnab Mondal , Mohammad Reza Samsami , Rabiul Awal , Perouz Taslakian , Spandana Gella , Sai Rajeswar , David Vazquez , Christopher Pal , Marco Pedersoli

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation

Text-to-image models are powerful for producing high-quality images based on given text prompts, but crafting these prompts often requires specialized vocabulary. To address this, existing methods train rewriting models with supervision…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Hongji Yang , Yucheng Zhou , Wencheng Han , Jianbing Shen

Vision-Guided Iterative Refinement for Frontend Code Generation

Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly - particularly in domains such as frontend web development where the solution quality depends on…

Artificial Intelligence · Computer Science 2026-04-08 Hannah Sansford , Derek H. C. Law , Wei Liu , Abhishek Tripathi , Niresh Agarwal , Gerrit J. J. van den Burg

Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Guotao Liang , Zhangcheng Wang , Juncheng Hu , Haitao Zhou , Ziteng Xue , Jing Zhang , Dong Xu , Qian Yu

Iterative Self-Improvement of Vision Language Models for Image Scoring and Self-Explanation

Image scoring is a crucial task in numerous real-world applications. To trust a model's judgment, understanding its rationale is essential. This paper proposes a novel training method for Vision Language Models (VLMs) to generate not only…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Naoto Tanji , Toshihiko Yamasaki

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Multimodal Large Language Models (MLLMs) exhibit impressive performance across various visual tasks. Subsequent investigations into enhancing their visual reasoning abilities have significantly expanded their performance envelope. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Yang Chen , Yufan Shen , Wenxuan Huang , Sheng Zhou , Qunshu Lin , Xinyu Cai , Zhi Yu , Jiajun Bu , Botian Shi , Yu Qiao

Where do Large Vision-Language Models Look at when Answering Questions?

Large Vision-Language Models (LVLMs) have shown promising performance in vision-language understanding and reasoning tasks. However, their visual understanding behaviors remain underexplored. A fundamental question arises: to what extent do…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Xiaoying Xing , Chia-Wen Kuo , Li Fuxin , Yulei Niu , Fan Chen , Ming Li , Ying Wu , Longyin Wen , Sijie Zhu

Large Language Models Facilitate Vision Reflection in Image Classification

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose…

Robotics · Computer Science 2024-06-18 Yufei Wang , Zhanyi Sun , Jesse Zhang , Zhou Xian , Erdem Biyik , David Held , Zackory Erickson

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In…

Computation and Language · Computer Science 2024-06-13 Jason Wu , Eldon Schoop , Alan Leung , Titus Barik , Jeffrey P. Bigham , Jeffrey Nichols

Leveraging Human Revisions for Improving Text-to-Layout Models

Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other…

Computation and Language · Computer Science 2024-05-24 Amber Xie , Chin-Yi Cheng , Forrest Huang , Yang Li

Visual Prompting with Iterative Refinement for Design Critique Generation

Feedback is crucial for every design process, such as user interface (UI) design, and automating design critiques can significantly improve the efficiency of the design workflow. Although existing multimodal large language models (LLMs)…

Artificial Intelligence · Computer Science 2025-05-26 Peitong Duan , Chin-Yi Cheng , Bjoern Hartmann , Yang Li

VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

Large language models (LLMs) have proven effective for layout generation due to their ability to produce structure-description languages, such as HTML or JSON. In this paper, we argue that while LLMs can perform reasonably well in certain…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Jiahao Zhang , Ryota Yoshihashi , Shunsuke Kitada , Atsuki Osanai , Yuta Nakashima

Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification

Multimodal Large Language Models (MLLMs) achieve strong multimodal reasoning performance, yet we identify a recurring failure mode in long-form generation: as outputs grow longer, models progressively drift away from image evidence and fall…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Shuai Lv , Chang Liu , Feng Tang , Yujie Yuan , Aojun Zhou , Kui Zhang , Xi Yang , Yangqiu Song

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts

Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Junwen He , Yifan Wang , Lijun Wang , Huchuan Lu , Jun-Yan He , Chenyang Li , Hanyuan Chen , Jin-Peng Lan , Bin Luo , Yifeng Geng

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models

In recent years, multimodal large language models (MLLMs) have achieved remarkable progress, primarily attributed to effective paradigms for integrating visual and textual information. The dominant connector-based paradigm projects visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xinpeng Dong , Min Zhang , Kairong Han , Xu Tan , Fei Wu , Kun Kuang

IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework

Scalable Vector Graphics (SVG) are central to digital design due to their inherent scalability and editability. Despite significant advancements in content generation enabled by Visual Language Models (VLMs), existing text-to-SVG generation…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Feiyu Wang , Jiayuan Yang , Zhiyuan Zhao , Da Zhang , Bingyu Li , Peng Liu , Junyu Gao

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

While recent Large Vision-Language Models (LVLMs) have shown remarkable performance in multi-modal tasks, they are prone to generating hallucinatory text responses that do not align with the given visual input, which restricts their…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Ce Zhang , Zifu Wan , Zhehan Kan , Martin Q. Ma , Simon Stepputtis , Deva Ramanan , Russ Salakhutdinov , Louis-Philippe Morency , Katia Sycara , Yaqi Xie

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models

Recent advances in text-only "slow-thinking" reasoning have prompted efforts to transfer this capability to vision-language models (VLMs), for training visual reasoning models (\textbf{VRMs}). owever, such transfer faces critical…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Pu Jian , Junhong Wu , Wei Sun , Chen Wang , Shuo Ren , Jiajun Zhang

Integrating Visual Interpretation and Linguistic Reasoning for Math Problem Solving

Current large vision-language models (LVLMs) typically employ a connector module to link visual features with text embeddings of large language models (LLMs) and use end-to-end training to achieve multi-modal understanding in a unified…

Artificial Intelligence · Computer Science 2025-08-14 Zixian Guo , Ming Liu , Qilong Wang , Zhilong Ji , Jinfeng Bai , Lei Zhang , Wangmeng Zuo