English
Related papers

Related papers: Seeing is Improving: Visual Feedback for Iterative…

200 papers

Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-language models (VLMs) have enabled high-quality SVG generation by framing the problem as a code…

Text-to-image models are powerful for producing high-quality images based on given text prompts, but crafting these prompts often requires specialized vocabulary. To address this, existing methods train rewriting models with supervision…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Hongji Yang , Yucheng Zhou , Wencheng Han , Jianbing Shen

Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is effective but very costly - particularly in domains such as frontend web development where the solution quality depends on…

Artificial Intelligence · Computer Science 2026-04-08 Hannah Sansford , Derek H. C. Law , Wei Liu , Abhishek Tripathi , Niresh Agarwal , Gerrit J. J. van den Burg

Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where…

Computer Vision and Pattern Recognition · Computer Science 2026-04-24 Guotao Liang , Zhangcheng Wang , Juncheng Hu , Haitao Zhou , Ziteng Xue , Jing Zhang , Dong Xu , Qian Yu

Image scoring is a crucial task in numerous real-world applications. To trust a model's judgment, understanding its rationale is essential. This paper proposes a novel training method for Vision Language Models (VLMs) to generate not only…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Naoto Tanji , Toshihiko Yamasaki

Multimodal Large Language Models (MLLMs) exhibit impressive performance across various visual tasks. Subsequent investigations into enhancing their visual reasoning abilities have significantly expanded their performance envelope. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Yang Chen , Yufan Shen , Wenxuan Huang , Sheng Zhou , Qunshu Lin , Xinyu Cai , Zhi Yu , Jiajun Bu , Botian Shi , Yu Qiao

Large Vision-Language Models (LVLMs) have shown promising performance in vision-language understanding and reasoning tasks. However, their visual understanding behaviors remain underexplored. A fundamental question arises: to what extent do…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Xiaoying Xing , Chia-Wen Kuo , Li Fuxin , Yulei Niu , Fan Chen , Ming Li , Ying Wu , Longyin Wen , Sijie Zhu

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose…

Robotics · Computer Science 2024-06-18 Yufei Wang , Zhanyi Sun , Jesse Zhang , Zhou Xian , Erdem Biyik , David Held , Zackory Erickson

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In…

Computation and Language · Computer Science 2024-06-13 Jason Wu , Eldon Schoop , Alan Leung , Titus Barik , Jeffrey P. Bigham , Jeffrey Nichols

Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other…

Computation and Language · Computer Science 2024-05-24 Amber Xie , Chin-Yi Cheng , Forrest Huang , Yang Li

Feedback is crucial for every design process, such as user interface (UI) design, and automating design critiques can significantly improve the efficiency of the design workflow. Although existing multimodal large language models (LLMs)…

Artificial Intelligence · Computer Science 2025-05-26 Peitong Duan , Chin-Yi Cheng , Bjoern Hartmann , Yang Li

Large language models (LLMs) have proven effective for layout generation due to their ability to produce structure-description languages, such as HTML or JSON. In this paper, we argue that while LLMs can perform reasonably well in certain…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Jiahao Zhang , Ryota Yoshihashi , Shunsuke Kitada , Atsuki Osanai , Yuta Nakashima

Multimodal Large Language Models (MLLMs) achieve strong multimodal reasoning performance, yet we identify a recurring failure mode in long-form generation: as outputs grow longer, models progressively drift away from image evidence and fall…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Shuai Lv , Chang Liu , Feng Tang , Yujie Yuan , Aojun Zhou , Kui Zhang , Xi Yang , Yangqiu Song

Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Junwen He , Yifan Wang , Lijun Wang , Huchuan Lu , Jun-Yan He , Chenyang Li , Hanyuan Chen , Jin-Peng Lan , Bin Luo , Yifeng Geng

In recent years, multimodal large language models (MLLMs) have achieved remarkable progress, primarily attributed to effective paradigms for integrating visual and textual information. The dominant connector-based paradigm projects visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xinpeng Dong , Min Zhang , Kairong Han , Xu Tan , Fei Wu , Kun Kuang

Scalable Vector Graphics (SVG) are central to digital design due to their inherent scalability and editability. Despite significant advancements in content generation enabled by Visual Language Models (VLMs), existing text-to-SVG generation…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Feiyu Wang , Jiayuan Yang , Zhiyuan Zhao , Da Zhang , Bingyu Li , Peng Liu , Junyu Gao

While recent Large Vision-Language Models (LVLMs) have shown remarkable performance in multi-modal tasks, they are prone to generating hallucinatory text responses that do not align with the given visual input, which restricts their…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Ce Zhang , Zifu Wan , Zhehan Kan , Martin Q. Ma , Simon Stepputtis , Deva Ramanan , Russ Salakhutdinov , Louis-Philippe Morency , Katia Sycara , Yaqi Xie

Recent advances in text-only "slow-thinking" reasoning have prompted efforts to transfer this capability to vision-language models (VLMs), for training visual reasoning models (\textbf{VRMs}). owever, such transfer faces critical…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Pu Jian , Junhong Wu , Wei Sun , Chen Wang , Shuo Ren , Jiajun Zhang

Current large vision-language models (LVLMs) typically employ a connector module to link visual features with text embeddings of large language models (LLMs) and use end-to-end training to achieve multi-modal understanding in a unified…

Artificial Intelligence · Computer Science 2025-08-14 Zixian Guo , Ming Liu , Qilong Wang , Zhilong Ji , Jinfeng Bai , Lei Zhang , Wangmeng Zuo
‹ Prev 1 2 3 10 Next ›