Related papers: Code-in-the-Loop Forensics: Agentic Tool Use for I…

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models

Advances in generative models have led to AI-generated images visually indistinguishable from authentic ones. Despite numerous studies on detecting AI-generated images with classifiers, a gap persists between such methods and human…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Chuangchuang Tan , Jinglu Wang , Xiang Ming , Renshuai Tao , Yunchao Wei , Yao Zhao , Yan Lu

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

The increasing realism of AI-Generated Images (AIGI) has created an urgent need for forensic tools capable of reliably distinguishing synthetic content from authentic imagery. Existing detectors are typically tailored to specific forgery…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Yangxin Yu , Yue Zhou , Bin Li , Kaiqing Lin , Haodong Li , Jiangqun Ni , Bo Cao

AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection

Benefiting from generalizability of vision-language models (VLMs) such as CLIP, many zero-/few-shot anomaly detection (AD) approaches have achieved impressive detection performance across various datasets. Nevertheless, they require…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Yi Zhang , Jiawen Zhu , Lele Fu , Guansong Pang

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly applied to the IMD task, M-LLMs often produce…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Zhihao Sun , Haoran Jiang , Haoran Chen , Yixin Cao , Xipeng Qiu , Zuxuan Wu , Yu-Gang Jiang

Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

Current Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in understanding multimodal data, but their potential remains underexplored for deepfake detection due to the misalignment of their knowledge and…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Peipeng Yu , Jianwei Fei , Hui Gao , Xuan Feng , Zhihua Xia , Chip Hong Chang

ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization

Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Fanrui Zhang , Jiawei Liu , Jiaying Zhu , Esther Sun , Dong Li , Qiang Zhang , Zheng-Jun Zha

Seeing Before Reasoning: A Unified Framework for Generalizable and Explainable Fake Image Detection

Detecting AI-generated images with multimodal large language models (MLLMs) has gained increasing attention, due to their rich world knowledge, common-sense reasoning, and potential for explainability. However, naively applying those MLLMs…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Kaiqing Lin , Zhiyuan Yan , Ruoxin Chen , Junyan Ye , Ke-Yue Zhang , Yue Zhou , Peng Jin , Bin Li , Taiping Yao , Shouhong Ding

CompAgent: An Agentic Framework for Visual Compliance Verification

Visual compliance verification is a critical yet underexplored problem in computer vision, especially in domains such as media, entertainment, and advertising where content must adhere to complex and evolving policy rules. Existing methods…

Computer Vision and Pattern Recognition · Computer Science 2026-03-23 Rahul Ghosh , Baishali Chaudhury , Hari Prasanna Das , Meghana Ashok , Ryan Razkenari , Long Chen , Sungmin Hong , Chun-Hao Liu

ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization

Existing Multimodal Large Language Models (MLLMs) for image forgery detection and localization predominantly operate under a text-centric Chain-of-Thought (CoT) paradigm. However, forcing these models to textually characterize imperceptible…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Youqi Wang , Shen Chen , Haowei Wang , Rongxuan Peng , Taiping Yao , Shunquan Tan , Changsheng Chen , Bin Li , Shouhong Ding

FROGENT: An End-to-End Full-process Drug Design Multi-Agent System

Drug discovery is a complex, multi-step pipeline that remains heavily dependent on manual, experience-driven operations; meanwhile, existing customized artificial intelligence tools are fragmented across web applications, desktop software,…

Biomolecules · Quantitative Biology 2026-03-03 Qihua Pan , Dong Xu , Qianwei Yang , Jenna Xinyi Yao , Sisi Yuan , Zexuan Zhu , Jianqiang Li , Junkai Ji

Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning

We present a Collaborative Agent-Based Framework for Multi-Image Reasoning. Our approach tackles the challenge of interleaved multimodal reasoning across diverse datasets and task formats by employing a dual-agent system: a language-based…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Angelos Vlachos , Giorgos Filandrianos , Maria Lymperaiou , Nikolaos Spanos , Ilias Mitsouras , Vasileios Karampinis , Athanasios Voulodimos

Toward Generalizable Forgery Detection and Reasoning

Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Yueying Gao , Dongliang Chang , Bingyao Yu , Haotian Qin , Muxi Diao , Lei Chen , Kongming Liang , Zhanyu Ma

IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

Few-Shot Industrial Anomaly Detection (FS-IAD) has important applications in automating industrial quality inspection. Recently, some FS-IAD methods based on Large Vision-Language Models (LVLMs) have been proposed with some achievements…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Mengyang Zhao , Teng Fu , Haiyang Yu , Ke Niu , Bin Li

Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection

Face forgery detection faces a critical challenge: a persistent gap between offline benchmarks and real-world efficacy,which we attribute to the ecological invalidity of training data.This work introduces Agent4FaceForgery to address two…

Computer Vision and Pattern Recognition · Computer Science 2025-09-17 Yingxin Lai , Zitong Yu , Jun Wang , Linlin Shen , Yong Xu , Xiaochun Cao

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing unprecedented threats to social security, politics, law, and etc. To detect the ever-increasingly diverse…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Jin Wang , Chenghui Lv , Xian Li , Shichao Dong , Huadong Li , kelu Yao , Chao Li , Wenqi Shao , Ping Luo

An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing

Despite the remarkable capabilities of text-to-image (T2I) generation models, real-world applications often demand fine-grained, iterative image editing that existing methods struggle to provide. Key challenges include granular instruction…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Zihan Liang , Jiahao Sun , Haoran Ma

Training Multi-Image Vision Agents via End2End Reinforcement Learning

Recent VLM-based agents aim to replicate OpenAI O3's "thinking with images" via tool use, yet most open-source methods restrict inputs to a single image, limiting their applicability to real-world multi-image QA tasks. To address this gap,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Chengqi Dong , Chuhuai Yue , Hang He , Rongge Mao , Fenghe Tang , S Kevin Zhou , Zekun Xu , Xiaohan Wang , Jiajun Chai , Guojun Yin

MLLM-Enhanced Face Forgery Detection: A Vision-Language Fusion Solution

Reliable face forgery detection algorithms are crucial for countering the growing threat of deepfake-driven disinformation. Previous research has demonstrated the potential of Multimodal Large Language Models (MLLMs) in identifying…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Siran Peng , Zipei Wang , Li Gao , Xiangyu Zhu , Tianshuo Zhang , Ajian Liu , Haoyuan Zhang , Zhen Lei

DriveAgent: Multi-Agent Structured Reasoning with LLM and Multimodal Sensor Fusion for Autonomous Driving

We introduce DriveAgent, a novel multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion to enhance situational understanding and decision-making. DriveAgent…

Robotics · Computer Science 2025-05-06 Xinmeng Hou , Wuqi Wang , Long Yang , Hao Lin , Jinglun Feng , Haigen Min , Xiangmo Zhao

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in…

Computer Vision and Pattern Recognition · Computer Science 2026-05-21 Rongbin Tan , Fangfang Lin , Zhenlong Yuan , Min Qiu , Kejin Cui , Mengmeng Wang , Yi Wang , Zijian Song , Zhiyuan Wang , Jiyuan Wang , Yue Wang , Shuhan Song§ , Huawei Cao