English
Related papers

Related papers: Red Teaming Visual Language Models

200 papers

Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Zhendong Liu , Yuanbi Nie , Yingshui Tan , Xiangyu Yue , Qiushi Cui , Chongjun Wang , Xiaoyong Zhu , Bo Zheng

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to…

Machine Learning · Computer Science 2024-12-17 Shuo Chen , Zhen Han , Bailan He , Zifeng Ding , Wenqian Yu , Philip Torr , Volker Tresp , Jindong Gu

Large Vision Language Models (VLMs) extend and enhance the perceptual abilities of Large Language Models (LLMs). Despite offering new possibilities for LLM applications, these advancements raise significant security and ethical concerns,…

Machine Learning · Computer Science 2024-07-23 Yi Liu , Chengjun Cai , Xiaoli Zhang , Xingliang Yuan , Cong Wang

Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates,…

Cryptography and Security · Computer Science 2026-05-27 Qilin Liao , Anamika Lochab , Ruqi Zhang

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing a significant safety risk to real-world…

Robotics · Computer Science 2026-04-08 Baoshun Tong , Haoran He , Ling Pan , Yang Liu , Liang Lin

While tool learning significantly enhances the capabilities of large language models (LLMs), it also introduces substantial security risks. Prior research has revealed various vulnerabilities in traditional LLMs during tool learning.…

Computation and Language · Computer Science 2025-05-26 Yifei Liu , Yu Cui , Haibin Zhang

Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still…

Computation and Language · Computer Science 2024-06-25 Jesse Atuhurra , Iqra Ali , Tatsuya Hiraoka , Hidetaka Kamigaito , Tomoya Iwakura , Taro Watanabe

Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabilities simply by optimizing over a next-word prediction objective. With the emergence of their properties and encoded knowledge, the risk of…

Computation and Language · Computer Science 2023-08-31 Rishabh Bhardwaj , Soujanya Poria

Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex multimodal harmful tasks. Mainstream black-box jailbreak attacks on VLMs work by distributing…

Cryptography and Security · Computer Science 2026-02-12 Yu Yan , Sheng Sun , Shengjia Cheng , Teli Liu , Mingfeng Li , Min Liu

Recent advancements in multimodal techniques open exciting possibilities for models excelling in diverse tasks involving text, audio, and image processing. Models like GPT-4V, blending computer vision and language modeling, excel in complex…

Computation and Language · Computer Science 2023-10-20 Xiang Zhang , Senyu Li , Zijun Wu , Ning Shi

Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought (CoT). However, these models introduce novel…

Cryptography and Security · Computer Science 2026-04-15 Jiawei Chen , Yang Yang , Chao Yu , Yu Tian , Zhi Cao , Xue Yang , Linghao Li , Hang Su , Zhaoxia Yin

The development of large vision-language models (LVLMs) offers the potential to address challenges faced by traditional multimodal recommendations thanks to their proficient understanding of static images and textual dynamics. However, the…

Artificial Intelligence · Computer Science 2024-02-14 Yuqing Liu , Yu Wang , Lichao Sun , Philip S. Yu

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to…

Cryptography and Security · Computer Science 2026-01-13 Xin Wang , Yunhao Chen , Juncheng Li , Yixu Wang , Yang Yao , Tianle Gu , Jie Li , Yan Teng , Yingchun Wang , Xia Hu

Vision Large Language Models (VLLMs) represent a significant advancement in artificial intelligence by integrating image-processing capabilities with textual understanding, thereby enhancing user interactions and expanding application…

Computation and Language · Computer Science 2025-05-09 Madhur Jindal , Saurabh Deshpande

Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Hao Dong , Moru Liu , Jian Liang , Eleni Chatzi , Olga Fink

Language Model Models (LLMs) have improved dramatically in the past few years, increasing their adoption and the scope of their capabilities over time. A significant amount of work is dedicated to ``model alignment'', i.e., preventing LLMs…

Computation and Language · Computer Science 2025-04-07 Abhishek Singhania , Christophe Dupuy , Shivam Mangale , Amani Namboori

Recent research looks to harness the general knowledge and reasoning of large language models (LLMs) into agents that accomplish user-specified goals in interactive environments. Vision-language models (VLMs) extend LLMs to multi-modal data…

Machine Learning · Computer Science 2025-05-07 Jake Grigsby , Yuke Zhu , Michael Ryoo , Juan Carlos Niebles

Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Xingxing Weng , Chao Pang , Gui-Song Xia

We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for…

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable…

‹ Prev 1 2 3 10 Next ›