English
Related papers

Related papers: Attacks on multimodal models

200 papers

With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional…

Computer Vision and Pattern Recognition · Computer Science 2024-07-15 Daizong Liu , Mingyu Yang , Xiaoye Qu , Pan Zhou , Yu Cheng , Wei Hu

Recent researches have shown that Large Language Models (LLMs) are susceptible to a security threat known as Backdoor Attack. The backdoored model will behave well in normal cases but exhibit malicious behaviours on inputs inserted with a…

Cryptography and Security · Computer Science 2024-04-04 Yunzhuo Hao , Wenkai Yang , Yankai Lin

Multi-modal foundation models like OpenFlamingo, LLaVA, and GPT-4 are increasingly used for various real-world tasks. Prior work has shown that these models are highly vulnerable to adversarial attacks on the vision modality. These attacks…

Machine Learning · Computer Science 2024-06-06 Christian Schlarmann , Naman Deep Singh , Francesco Croce , Matthias Hein

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the…

Computation and Language · Computer Science 2023-10-18 Erfan Shayegani , Md Abdullah Al Mamun , Yu Fu , Pedram Zaree , Yue Dong , Nael Abu-Ghazaleh

Multimodal large language models (MLLMs), which bridge the gap between audio-visual and natural language processing, achieve state-of-the-art performance on several audio-visual tasks. Despite the superior performance of MLLMs, the scarcity…

Cryptography and Security · Computer Science 2025-06-16 Jinming Wen , Xinyi Wu , Shuai Zhao , Yanhao Jia , Yuwen Li

Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this…

Cryptography and Security · Computer Science 2026-03-31 Bhavuk Jain , Sercan Ö. Arık , Hardeo K. Thakur

Large language models (LLMs) and LLM-based agents have been widely deployed in a wide range of applications in the real world, including healthcare diagnostics, financial analysis, customer support, robotics, and autonomous driving,…

Cryptography and Security · Computer Science 2025-05-20 Wenrui Xu , Keshab K. Parhi

Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by combining visual and text data, making applications like image captioning, visual question answering, and multi-modal content creation possible. This ability to…

Cryptography and Security · Computer Science 2024-11-11 Pete Janowczyk , Linda Laurier , Ave Giulietta , Arlo Octavia , Meade Cleti

Ensuring the security of large language models (LLMs) is an ongoing challenge despite their widespread popularity. Developers work to enhance LLMs security, but vulnerabilities persist, even in advanced versions like GPT-4. Attackers…

Cryptography and Security · Computer Science 2023-12-19 Aysan Esmradi , Daniel Wankit Yip , Chun Fai Chan

The rise of pre-trained unified foundation models breaks down the barriers between different modalities and tasks, providing comprehensive support to users with unified architectures. However, the backdoor attack on pre-trained models poses…

Cryptography and Security · Computer Science 2023-02-27 Zenghui Yuan , Yixin Liu , Kai Zhang , Pan Zhou , Lichao Sun

The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and content moderation. However, the real-world robustness of these defenses against…

Cryptography and Security · Computer Science 2025-11-21 Yijun Yang , Lichao Wang , Jianping Zhang , Chi Harold Liu , Lanqing Hong , Qiang Xu

Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate responses, or bypass safety mechanisms. Existing methods seek to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Hashmat Shadab Malik , Fahad Shamshad , Muzammal Naseer , Karthik Nandakumar , Fahad Khan , Salman Khan

The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting…

Computation and Language · Computer Science 2024-06-24 Siyuan Wang , Zhuohan Long , Zhihao Fan , Zhongyu Wei

Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Jiyuan Fu , Zhaoyu Chen , Kaixun Jiang , Haijing Guo , Jiafeng Wang , Shuyong Gao , Wenqiang Zhang

Large Vision-Language Models (LVLMs) have shown remarkable capabilities across a wide range of multimodal tasks. However, their integration of visual inputs introduces expanded attack surfaces, thereby exposing them to novel security…

Computation and Language · Computer Science 2025-05-29 Juan Ren , Mark Dras , Usman Naseem

Large language models (LLMs) have achieved record adoption in a short period of time across many different sectors including high importance areas such as education [4] and healthcare [23]. LLMs are open-ended models trained on diverse data…

Cryptography and Security · Computer Science 2024-12-24 Herve Debar , Sven Dietrich , Pavel Laskov , Emil C. Lupu , Eirini Ntoutsi

The widespread use of Vision Language Models (VLMs, e.g. CLIP) has raised concerns about their vulnerability to sophisticated and imperceptible adversarial attacks. These attacks could compromise model performance and system security in…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Xiaowei Fu , Lei Zhang

Vision-Language Models (VLMs) are now a core part of modern AI. Recent work proposed several visual jailbreak attacks using single/ holistic images. However, contemporary VLMs demonstrate strong robustness against such attacks due to…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Md Rafi Ur Rashid , MD Sadik Hossain Shanto , Vishnu Asutosh Dasu , Shagufta Mehnaz

Pretrained vision-language models (VLMs) like CLIP exhibit exceptional generalization across diverse downstream tasks. While recent studies reveal their vulnerability to adversarial attacks, research to date has primarily focused on…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Wanqi Zhou , Shuanghao Bai , Danilo P. Mandic , Qibin Zhao , Badong Chen

Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their robustness to physical adversarial…

Computer Vision and Pattern Recognition · Computer Science 2026-05-01 David Fernandez , Pedram MohajerAnsari , Amir Salarpour , Mert D. Pese
‹ Prev 1 2 3 10 Next ›