Related papers: Perception in Reflection

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Self-reflection mechanisms that rely on purely text-based rethinking processes perform well in most multimodal tasks. However, when directly applied to long-form video understanding scenarios, they exhibit clear limitations. The fundamental…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Jiaze Li , Hao Yin , Wenhui Tan , Jingyang Chen , Boshen Xu , Yuxun Qu , Yijing Chen , Jianzhong Ju , Zhenbo Luo , Jian Luan

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection…

Computation and Language · Computer Science 2024-06-24 Andong Chen , Lianzhang Lou , Kehai Chen , Xuefeng Bai , Yang Xiang , Muyun Yang , Tiejun Zhao , Min Zhang

VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

While Large Language Models (LLMs) excel at reasoning on text and Vision-Language Models (VLMs) are highly effective for visual perception, applying those models for visual instruction-based planning remains a widely open problem. In this…

Machine Learning · Computer Science 2025-09-11 Mohamed Salim Aissi , Clemence Grislain , Mohamed Chetouani , Olivier Sigaud , Laure Soulier , Nicolas Thome

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Multimodal large language models (MLLMs) have shown promising capabilities in reasoning tasks, yet still struggle with complex problems requiring explicit self-reflection and self-correction, especially compared to their unimodal text-based…

Computation and Language · Computer Science 2025-10-07 Zhongwei Wan , Zhihao Dou , Che Liu , Yu Zhang , Dongfei Cui , Qinjian Zhao , Hui Shen , Jing Xiong , Yi Xin , Yifan Jiang , Chaofan Tao , Yangfan He , Mi Zhang , Shen Yan

ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model

The limited capacity for fine-grained visual perception presents a critical bottleneck for Vision-Language Models (VLMs) in real-world applications. Addressing this is challenging due to the scarcity of high-quality data and the limitations…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Juntian Zhang , Song Jin , Chuanqi Cheng , Yuhan Liu , Yankai Lin , Xun Zhang , Yufei Zhang , Fei Jiang , Guojun Yin , Wei Lin , Rui Yan

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in MLLM post-training for perception policy learning. While promising, our initial experiments reveal that incorporating a thinking…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 En Yu , Kangheng Lin , Liang Zhao , Jisheng Yin , Yana Wei , Yuang Peng , Haoran Wei , Jianjian Sun , Chunrui Han , Zheng Ge , Xiangyu Zhang , Daxin Jiang , Jingyu Wang , Wenbing Tao

Large Language Models Facilitate Vision Reflection in Image Classification

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning

While Large language models (LLMs) have the capability to iteratively reflect on their own outputs, recent studies have observed their struggles with knowledge-rich problems without access to external resources. In addition to the…

Computation and Language · Computer Science 2024-06-25 Hanqi Yan , Qinglin Zhu , Xinyu Wang , Lin Gui , Yulan He

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

We explore a method for improving the performance of large language models through self-reflection and reinforcement learning. By incentivizing the model to generate better self-reflections when it answers incorrectly, we demonstrate that a…

Computation and Language · Computer Science 2025-06-02 Shelly Bensal , Umar Jamil , Christopher Bryant , Melisa Russak , Kiran Kamble , Dmytro Mozolevskyi , Muayad Ali , Waseem AlShikh

Improving Vision-language Models with Perception-centric Process Reward Models

Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Yingqian Min , Kun Zhou , Yifan Li , Yuhuan Wu , Han Peng , Yifan Du , Wayne Xin Zhao , Min Yang , Ji-Rong Wen

X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation

Large Language Models (LLMs) have been shown to enhance the effectiveness of enriching item descriptions, thereby improving the accuracy of recommendation systems. However, most existing approaches either rely on text-only prompting or…

Information Retrieval · Computer Science 2025-10-24 Hanjia Lyu , Ryan Rossi , Xiang Chen , Md Mehrab Tanjim , Stefano Petrangeli , Somdeb Sarkhel , Jiebo Luo

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that circumvent conventional surface-level safety alignment by exploiting the internal generation…

Machine Learning · Computer Science 2026-05-21 Jiachen Ma , Jiawen Zhang , Xiangtian Li , Bo Zou , Chaochao Lu , Chao Yang

Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent

Large language model (LLM) agents achieve impressive single-task performance but commonly exhibit repeated failures, inefficient exploration, and limited cross-task adaptability. Existing reflective strategies (e.g., Reflexion, ReAct)…

Artificial Intelligence · Computer Science 2025-09-09 Chunlong Wu , Ye Luo , Zhibo Qu , Min Wang

Reinforced Visual Perception with Tools

Visual reasoning, a cornerstone of human intelligence, encompasses complex perceptual and logical processes essential for solving diverse visual problems. While advances in computer vision have produced powerful models for various…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Zetong Zhou , Dongping Chen , Zixian Ma , Zhihan Hu , Mingyang Fu , Sinan Wang , Yao Wan , Zhou Zhao , Ranjay Krishna

Language Models with Rationality

While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of…

Computation and Language · Computer Science 2023-10-31 Nora Kassner , Oyvind Tafjord , Ashish Sabharwal , Kyle Richardson , Hinrich Schuetze , Peter Clark

Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models

Representation Engineering (RepE) has emerged as a powerful paradigm for enhancing AI transparency by focusing on high-level representations rather than individual neurons or circuits. It has proven effective in improving interpretability…

Machine Learning · Computer Science 2025-04-01 Bowei Tian , Xuntao Lyu , Meng Liu , Hongyi Wang , Ang Li

REPT: Bridging Language Models and Machine Reading Comprehension via Retrieval-Based Pre-training

Pre-trained Language Models (PLMs) have achieved great success on Machine Reading Comprehension (MRC) over the past few years. Although the general language representation learned from large-scale corpora does benefit MRC, the poor support…

Computation and Language · Computer Science 2021-05-19 Fangkai Jiao , Yangyang Guo , Yilin Niu , Feng Ji , Feng-Lin Li , Liqiang Nie

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

Reference-based image super-resolution (RefSR) is a promising SR branch and has shown great potential in overcoming the limitations of single image super-resolution. While previous state-of-the-art RefSR methods mainly focus on improving…

Computer Vision and Pattern Recognition · Computer Science 2022-11-09 Lin Zhang , Xin Li , Dongliang He , Fu Li , Yili Wang , Zhaoxiang Zhang

RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension

Referring Expression Comprehension (REC) is a vision-language task that localizes a specific image region based on a textual description. Existing REC benchmarks primarily evaluate perceptual capabilities and lack interpretable scoring…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Tianyi Gao , Hao Li , Han Fang , Xin Wei , Xiaodong Dong , Hongbo Sun , Ye Yuan , Zhongjiang He , Jinglin Xu , Jingmin Xin , Hao Sun

ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework

Reward Models (RMs) are critical components in the Reinforcement Learning from Human Feedback (RLHF) pipeline, directly determining the alignment quality of Large Language Models (LLMs). Recently, Generative Reward Models (GRMs) have…

Artificial Intelligence · Computer Science 2026-04-21 Kai Qin , Liangxin Liu , Yu Liang , Longzheng Wang , Yan Wang , Yueyang Zhang , Long Xia , Zhiyuan Sun , Houde Liu , Daiting Shi