English
Related papers

Related papers: Aligning Text, Code, and Vision: A Multi-Objective…

200 papers

Automated data visualization plays a crucial role in simplifying data interpretation, enhancing decision-making, and improving efficiency. While large language models (LLMs) have shown promise in generating visualizations from natural…

Computation and Language · Computer Science 2025-07-29 Mizanur Rahman , Md Tahmid Rahman Laskar , Shafiq Joty , Enamul Hoque

Text-to-Vis is an emerging task in the natural language processing (NLP) area that aims to automatically generate data visualizations from natural language questions (NLQs). Despite their progress, existing text-to-vis models often heavily…

Computation and Language · Computer Science 2024-04-12 Jinwei Lu , Yuanfeng Song , Haodi Zhang , Chen Zhang , Raymond Chi-Wing Wong

Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Omar Sharif , Eftekhar Hossain , Patrick Ng

Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks. However, unlike vision-language models such as CLIP, self-supervised visual features are not…

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately…

Artificial Intelligence · Computer Science 2025-09-23 Jun Ling , Yao Qi , Tao Huang , Shibo Zhou , Yanqin Huang , Jiang Yang , Ziqi Song , Ying Zhou , Yang Yang , Heng Tao Shen , Peng Wang

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zhen Yang , Wenyi Hong , Mingde Xu , Xinyue Fan , Weihan Wang , Jiale Cheng , Xiaotao Gu , Jie Tang

Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or…

Computation and Language · Computer Science 2026-01-28 Hanyang Wang , Lu Wang , Chaoyun Zhang , Tianjun Mao , Si Qin , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs)…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Ziyu Liu , Shengyuan Ding , Xinyu Fang , Xuanlang Dai , Penghui Yang , Jianze Liang , Jiaqi Wang , Kai Chen , Dahua Lin , Yuhang Zang

Recent advancements in reinforcement learning, particularly through Group Relative Policy Optimization (GRPO), have significantly improved multimodal large language models for complex reasoning tasks. However, two critical limitations…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Jisheng Dang , Jingze Wu , Teng Wang , Xuanhui Lin , Nannan Zhu , Hongbo Chen , Wei-Shi Zheng , Meng Wang , Tat-Seng Chua

With respect to improving the reasoning accuracy of LLMs, the representative reinforcement learning (RL) method GRPO faces failure due to insignificant reward variance, while verification methods based on process reward models (PRMs) suffer…

Artificial Intelligence · Computer Science 2025-09-09 Sining Zhoubian , Dan Zhang , Jie Tang

The frequent need for analysts to create visualizations to derive insights from data has driven extensive research into the generation of natural Language to Visualization (NL2VIS). While recent progress in large language models (LLMs)…

Human-Computer Interaction · Computer Science 2025-12-12 Xinyu Wang , Chenwei Liang , Shunyuan Zheng , Jinyuan Liang , Guozheng Li , Yu Zhang , Chi Harold Liu

Generating accurate and executable code using Large Language Models (LLMs) remains a significant challenge for underrepresented programming languages, such as Prolog and Lisp, due to the scarcity of public training data compared to…

Machine Learning · Computer Science 2026-05-26 Federico Pennino , Bianca Raimondi , Massimo Rondelli , Andrea Gurioli , Maurizio Gabbrielli

The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table, enabling users to gain insights from vast amounts of data. Recently, many deep…

Databases · Computer Science 2024-04-29 Yang Wu , Yao Wan , Hongyu Zhang , Yulei Sui , Wucai Wei , Wei Zhao , Guandong Xu , Hai Jin

Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Wenbo Hu , Xin Chen , Yan Gao-Tian , Yihe Deng , Nanyun Peng , Kai-Wei Chang

Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce. Recent open-source work like DeepSeek-R1 demonstrates…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Ziyu Liu , Zeyi Sun , Yuhang Zang , Xiaoyi Dong , Yuhang Cao , Haodong Duan , Dahua Lin , Jiaqi Wang

Text-to-Visualization (Text-to-Vis) translates natural language queries into visualization query languages, enabling non-expert users to perform data analysis. However, most existing methods follow a one-shot paradigm that requires users to…

Reinforcement learning (RL) has demonstrated significant promise in enhancing the reasoning capabilities of Text2SQL LLMs, especially with advanced algorithms such as GRPO and DAPO. However, the performance of these methods is highly…

Multimodal Large Language Models (MLLMs) exhibit impressive performance across various visual tasks. Subsequent investigations into enhancing their visual reasoning abilities have significantly expanded their performance envelope. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Yang Chen , Yufan Shen , Wenxuan Huang , Sheng Zhou , Qunshu Lin , Xinyu Cai , Zhi Yu , Jiajun Bu , Botian Shi , Yu Qiao

Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework…

Machine Learning · Computer Science 2024-05-28 Tianbao Xie , Siheng Zhao , Chen Henry Wu , Yitao Liu , Qian Luo , Victor Zhong , Yanchao Yang , Tao Yu
‹ Prev 1 2 3 10 Next ›