Computation and Language · Computer Science
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao +1
2024-04-29
Computer Vision and Pattern Recognition · Computer Science
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu +4
2025-06-23
Computation and Language · Computer Science
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
Julius Mayer, Mohamad Ballout, Serwan Jassim, Farbod Nosrat Nezami +1
2025-10-01
Computer Vision and Pattern Recognition · Computer Science
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
Kaitao Chen, Shaohao Rui, Yankai Jiang, Jiamin Wu +5
2025-10-14
Computer Vision and Pattern Recognition · Computer Science
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Haobo Yuan, Yueyi Sun, Yanwei Li, Tao Zhang +6
2025-12-05
Robotics · Computer Science
Differentiate-and-Inject: Enhancing VLAs via Functional Differentiation Induced by In-Parameter Structural Reasoning
Jingyi Hou, Leyu Zhou, Chenchen Jing, Jinghan Yang +2
2026-02-10
Human-Computer Interaction · Computer Science
VisTR: Visualizations as Representations for Time-series Table Reasoning
Jianing Hao, Zhuowen Liang, Chunting Li, Yuyu Luo +2
2024-12-24
Machine Learning · Computer Science
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
Mohamed Salim Aissi, Clemence Grislain, Mohamed Chetouani, Olivier Sigaud +2
2025-09-11
Computer Vision and Pattern Recognition · Computer Science
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
Tingshu Mou, Jiabo He, Renying Wang, Ce Liu +4
2026-05-12
Computer Vision and Pattern Recognition · Computer Science
Vision-aligned Latent Reasoning for Multi-modal Large Language Model
Byungwoo Jeon, Yoonwoo Jeong, Hyunseok Lee, Minsu Cho +1
2026-05-13
Computer Vision and Pattern Recognition · Computer Science
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chao Wang, Chunbai Zhang, Yongxiao Tian, Yang Zhou +1
2025-09-03
Computer Vision and Pattern Recognition · Computer Science
ViSTa Dataset: Do vision-language models understand sequential tasks?
Evžen Wybitul, Evan Ryan Gunter, Mikhail Seleznyov, David Lindner
2024-11-22
Computation and Language · Computer Science
Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
Yiran Zhang, Mingyang Lin, Mark Dras, Usman Naseem
2025-11-14
Computer Vision and Pattern Recognition · Computer Science
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Yucheng Shi, Quanzheng Li, Jin Sun, Xiang Li +1
2025-02-26
Computer Vision and Pattern Recognition · Computer Science
LanteRn: Latent Visual Structured Reasoning
André G. Viveiros, Nuno Gonçalves, Matthias Lindemann, André Martins
2026-03-27
Computer Vision and Pattern Recognition · Computer Science
Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task
Yanbei Jiang, Yihao Ding, Chao Lei, Jiayang Ao +2
2025-06-02
Computer Vision and Pattern Recognition · Computer Science
VGR: Visual Grounded Reasoning
Jiacong Wang, Zijian Kang, Haochen Wang, Haiyong Jiang +7
2026-05-04
Computer Vision and Pattern Recognition · Computer Science
STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs
Zongzhao Li, Zongyang Ma, Mingze Li, Songyou Li +5
2025-07-11
Computer Vision and Pattern Recognition · Computer Science
See, Think, Learn: A Self-Taught Multimodal Reasoner
Sourabh Sharma, Sonam Gupta, Sadbhawna
2025-12-03
Databases · Computer Science
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables
Nikhil Abhyankar, Vivek Gupta, Dan Roth, Chandan K. Reddy
2025-04-08
Computer Vision and Pattern Recognition · Computer Science
VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs
Qiaoru Li, Shaotian Liang, Jintao Chen, Haoran Sun +3
2026-05-28
Computer Vision and Pattern Recognition · Computer Science
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Zhenlong Yuan, Xiangyan Qu, Chengxuan Qian, Rui Chen +7
2026-04-20