Related papers: Red Teaming Visual Language Models

Safety Alignment for Vision Language Models

Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Zhendong Liu , Yuanbi Nie , Yingshui Tan , Xiangyu Yue , Qiushi Cui , Chongjun Wang , Xiaoyong Zhu , Bo Zheng

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to…

Machine Learning · Computer Science 2024-12-17 Shuo Chen , Zhen Han , Bailan He , Zifeng Ding , Wenqian Yu , Philip Torr , Volker Tresp , Jindong Gu

Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts

Large Vision Language Models (VLMs) extend and enhance the perceptual abilities of Large Language Models (LLMs). Despite offering new possibilities for LLM applications, these advancements raise significant security and ethical concerns,…

Machine Learning · Computer Science 2024-07-23 Yi Liu , Chengjun Cai , Xiaoli Zhang , Xingliang Yuan , Cong Wang

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates,…

Cryptography and Security · Computer Science 2026-05-27 Qilin Liao , Anamika Lochab , Ruqi Zhang

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing a significant safety risk to real-world…

Robotics · Computer Science 2026-04-08 Baoshun Tong , Haoran He , Ling Pan , Yang Liu , Liang Lin

RRTL: Red Teaming Reasoning Large Language Models in Tool Learning

While tool learning significantly enhances the capabilities of large language models (LLMs), it also introduces substantial security risks. Prior research has revealed various vulnerabilities in traditional LLMs during tool learning.…

Computation and Language · Computer Science 2025-05-26 Yifei Liu , Yu Cui , Haibin Zhang

Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still…

Computation and Language · Computer Science 2024-06-25 Jesse Atuhurra , Iqra Ali , Tatsuya Hiraoka , Hidetaka Kamigaito , Tomoya Iwakura , Taro Watanabe

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabilities simply by optimizing over a next-word prediction objective. With the emergence of their properties and encoded knowledge, the risk of…

Computation and Language · Computer Science 2023-08-31 Rishabh Bhardwaj , Soujanya Poria

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks

Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex multimodal harmful tasks. Mainstream black-box jailbreak attacks on VLMs work by distributing…

Cryptography and Security · Computer Science 2026-02-12 Yu Yan , Sheng Sun , Shengjia Cheng , Teli Liu , Mingfeng Li , Min Liu

Lost in Translation: When GPT-4V(ision) Can't See Eye to Eye with Text. A Vision-Language-Consistency Analysis of VLLMs and Beyond

Recent advancements in multimodal techniques open exciting possibilities for models excelling in diverse tasks involving text, audio, and image processing. Models like GPT-4V, blending computer vision and language modeling, excel in complex…

Computation and Language · Computer Science 2023-10-20 Xiang Zhang , Senyu Li , Zijun Wu , Ning Shi

Red Teaming Large Reasoning Models

Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought (CoT). However, these models introduce novel…

Cryptography and Security · Computer Science 2026-04-15 Jiawei Chen , Yang Yang , Chao Yu , Yu Tian , Zhi Cao , Xue Yang , Linghao Li , Hang Su , Zhaoxia Yin

Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models

The development of large vision-language models (LVLMs) offers the potential to address challenges faced by traditional multimodal recommendations thanks to their proficient understanding of static images and textual dynamics. However, the…

Artificial Intelligence · Computer Science 2024-02-14 Yuqing Liu , Yu Wang , Lichao Sun , Philip S. Yu

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to…

Cryptography and Security · Computer Science 2026-01-13 Xin Wang , Yunhao Chen , Juncheng Li , Yixu Wang , Yang Yao , Tianle Gu , Jie Li , Yan Teng , Yingchun Wang , Xia Hu

REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM

Vision Large Language Models (VLLMs) represent a significant advancement in artificial intelligence by integrating image-processing capabilities with textual understanding, thereby enhancing user interactions and expanding application…

Computation and Language · Computer Science 2025-05-09 Madhur Jindal , Saurabh Deshpande

To Trust Or Not To Trust Your Vision-Language Model's Prediction

Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Hao Dong , Moru Liu , Jian Liang , Eleni Chatzi , Olga Fink

Multi-lingual Multi-turn Automated Red Teaming for LLMs

Language Model Models (LLMs) have improved dramatically in the past few years, increasing their adoption and the scope of their capabilities over time. A significant amount of work is dedicated to ``model alignment'', i.e., preventing LLMs…

Computation and Language · Computer Science 2025-04-07 Abhishek Singhania , Christophe Dupuy , Shivam Mangale , Amani Namboori

VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making

Recent research looks to harness the general knowledge and reasoning of large language models (LLMs) into agents that accomplish user-specified goals in interactive environments. Vision-language models (VLMs) extend LLMs to multi-modal data…

Machine Learning · Computer Science 2025-05-07 Jake Grigsby , Yuke Zhu , Michael Ryoo , Juan Carlos Niebles

Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives

Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Xingxing Weng , Chao Pang , Gui-Song Xia

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for…

Computation and Language · Computer Science 2022-11-24 Deep Ganguli , Liane Lovitt , Jackson Kernion , Amanda Askell , Yuntao Bai , Saurav Kadavath , Ben Mann , Ethan Perez , Nicholas Schiefer , Kamal Ndousse , Andy Jones , Sam Bowman , Anna Chen , Tom Conerly , Nova DasSarma , Dawn Drain , Nelson Elhage , Sheer El-Showk , Stanislav Fort , Zac Hatfield-Dodds , Tom Henighan , Danny Hernandez , Tristan Hume , Josh Jacobson , Scott Johnston , Shauna Kravec , Catherine Olsson , Sam Ringer , Eli Tran-Johnson , Dario Amodei , Tom Brown , Nicholas Joseph , Sam McCandlish , Chris Olah , Jared Kaplan , Jack Clark

Rethinking VLMs and LLMs for Image Classification

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable…

Machine Learning · Computer Science 2024-10-22 Avi Cooper , Keizo Kato , Chia-Hsien Shih , Hiroaki Yamane , Kasper Vinken , Kentaro Takemoto , Taro Sunagawa , Hao-Wei Yeh , Jin Yamanaka , Ian Mason , Xavier Boix