Related papers: Aligning Text, Code, and Vision: A Multi-Objective…

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

Automated data visualization plays a crucial role in simplifying data interpretation, enhancing decision-making, and improving efficiency. While large language models (LLMs) have shown promise in generating visualizations from natural…

Computation and Language · Computer Science 2025-07-29 Mizanur Rahman , Md Tahmid Rahman Laskar , Shafiq Joty , Enamul Hoque

Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability

Text-to-Vis is an emerging task in the natural language processing (NLP) area that aims to automatically generate data visualizations from natural language questions (NLQs). Despite their progress, existing text-to-vis models often heavily…

Computation and Language · Computer Science 2024-04-12 Jinwei Lu , Yuanfeng Song , Haodi Zhang , Chen Zhang , Raymond Chi-Wing Wong

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Omar Sharif , Eftekhar Hossain , Patrick Ng

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks. However, unlike vision-language models such as CLIP, self-supervised visual features are not…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Cijo Jose , Théo Moutakanni , Dahyun Kang , Federico Baldassarre , Timothée Darcet , Hu Xu , Daniel Li , Marc Szafraniec , Michaël Ramamonjisoa , Maxime Oquab , Oriane Siméoni , Huy V. Vo , Patrick Labatut , Piotr Bojanowski

Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately…

Artificial Intelligence · Computer Science 2025-09-23 Jun Ling , Yao Qi , Tao Huang , Shibo Zhou , Yanqin Huang , Jiang Yang , Ziqi Song , Ying Zhou , Yang Yang , Heng Tao Shen , Peng Wang

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Computation and Language · Computer Science 2026-03-30 Jiajun Zhang , Yuying Li , Zhixun Li , Xingyu Guo , Jingzhuo Wu , Leqi Zheng , Yiran Yang , Jianke Zhang , Qingbin Li , Shannan Yan , Zhetong Li , Changguo Jia , Junfei Wu , Zilei Wang , Qiang Liu , Liang Wang

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zhen Yang , Wenyi Hong , Mingde Xu , Xinyue Fan , Weihan Wang , Jiale Cheng , Xiaotao Gu , Jie Tang

Text2Grad: Reinforcement Learning from Natural Language Feedback

Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or…

Computation and Language · Computer Science 2026-01-28 Hanyang Wang , Lu Wang , Chaoyun Zhang , Tianjun Mao , Si Qin , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

Visual-ERM: Reward Modeling for Visual Equivalence

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Ziyu Liu , Shengyuan Ding , Xinyu Fang , Xuanlang Dai , Penghui Yang , Jianze Liang , Jiaqi Wang , Kai Chen , Dahua Lin , Yuhang Zang

Reinforcing Video Reasoning with Focused Thinking

Recent advancements in reinforcement learning, particularly through Group Relative Policy Optimization (GRPO), have significantly improved multimodal large language models for complex reasoning tasks. However, two critical limitations…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Jisheng Dang , Jingze Wu , Teng Wang , Xuanhui Lin , Nannan Zhu , Hongbo Chen , Wei-Shi Zheng , Meng Wang , Tat-Seng Chua

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

With respect to improving the reasoning accuracy of LLMs, the representative reinforcement learning (RL) method GRPO faces failure due to insignificant reward variance, while verification methods based on process reward models (PRMs) suffer…

Artificial Intelligence · Computer Science 2025-09-09 Sining Zhoubian , Dan Zhang , Jie Tang

Visualization Generation with Large Language Models: An Evaluation

The frequent need for analysts to create visualizations to derive insights from data has driven extensive research into the generation of natural Language to Visualization (NL2VIS). While recent progress in large language models (LLMs)…

Human-Computer Interaction · Computer Science 2025-12-12 Xinyu Wang , Chenwei Liang , Shunyuan Zheng , Jinyuan Liang , Guozheng Li , Yu Zhang , Chi Harold Liu

From Reasoning to Code: GRPO Optimization for Underrepresented Languages

Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming languages, such as Prolog and Lisp, due to the scarcity of public training data compared to…

Machine Learning · Computer Science 2026-05-26 Federico Pennino , Bianca Raimondi , Massimo Rondelli , Andrea Gurioli , Maurizio Gabbrielli

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table, enabling users to gain insights from vast amounts of data. Recently, many deep…

Databases · Computer Science 2024-04-29 Yang Wu , Yao Wan , Hongyu Zhang , Yulei Sui , Wucai Wei , Wei Zhao , Guandong Xu , Hai Jin

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Wenbo Hu , Xin Chen , Yan Gao-Tian , Yihe Deng , Nanyun Peng , Kai-Wei Chang

Visual-RFT: Visual Reinforcement Fine-Tuning

Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce. Recent open-source work like DeepSeek-R1 demonstrates…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Ziyu Liu , Zeyi Sun , Yuhang Zang , Xiaoyi Dong , Yuhang Cao , Haodong Duan , Dahua Lin , Jiaqi Wang

Towards Reliable Agentic Progressive Text-to-Visualization with Verification Rules

Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…

Databases · Computer Science 2026-05-29 Wenxin Xu , Chen Jason Zhang , Xiaoyong Wei , Haoyang Li , Hwanhee Kim , Yuanfeng Song , Raymond Chi-Wing Wong

ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning

Reinforcement learning (RL) has demonstrated significant promise in enhancing the reasoning capabilities of Text2SQL LLMs, especially with advanced algorithms such as GRPO and DAPO. However, the performance of these methods is highly…

Machine Learning · Computer Science 2025-11-14 Weiqin Chen , Nhan Huu Pham , Michael Robert Glass , Long Hai Vu , Gaetano Rossiello , Dharmashankar Subramanian , Santiago Paternain

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Multimodal Large Language Models (MLLMs) exhibit impressive performance across various visual tasks. Subsequent investigations into enhancing their visual reasoning abilities have significantly expanded their performance envelope. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Yang Chen , Yufan Shen , Wenxuan Huang , Sheng Zhou , Qunshu Lin , Xinyu Cai , Zhi Yu , Jiajun Bu , Botian Shi , Yu Qiao

Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework…

Machine Learning · Computer Science 2024-05-28 Tianbao Xie , Siheng Zhao , Chen Henry Wu , Yitao Liu , Qian Luo , Victor Zhong , Yanchao Yang , Tao Yu