English
Related papers

Related papers: Extracting Training Data from Document-Based VQA M…

200 papers

Large Language Models (LLMs) have a privacy concern because they memorize training data (including personally identifiable information (PII) like emails and phone numbers) and leak it during inference. A company can train an LLM on its…

Cryptography and Security · Computer Science 2023-07-21 Jaydeep Borkar

Due to the sensitive nature of personally identifiable information (PII), its owners may have the authority to control its inclusion or request its removal from large-language model (LLM) training. Beyond this, PII may be added or removed…

Computation and Language · Computer Science 2025-06-27 Jaydeep Borkar , Matthew Jagielski , Katherine Lee , Niloofar Mireshghallah , David A. Smith , Christopher A. Choquette-Choo

Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII), which can violate privacy regulations and compromise individual…

The increasing use of Online Vision Language Models (OVLMs) for processing images has introduced significant privacy risks, as individuals frequently upload images for various utilities, unaware of the potential for privacy violations.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Karmesh Siddharam Chaudhari , Youxiang Zhu , Amy Feng , Xiaohui Liang , Honggang Zhang

Language models (LMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. Current approaches to address this issue involve costly dataset…

Computation and Language · Computer Science 2025-09-09 Tomer Ashuach , Martin Tutek , Yonatan Belinkov

As the deployment of pre-trained language models (PLMs) expands, pressing security concerns have arisen regarding the potential for malicious extraction of training data, posing a threat to data privacy. This study is the first to provide a…

Computation and Language · Computer Science 2023-05-26 Shotaro Ishihara

Large Language Models (LLMs) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization…

Computation and Language · Computer Science 2026-01-08 Xiaoyu Luo , Yiyi Chen , Qiongxiu Li , Johannes Bjerva

Large Language Models (LLMs) are prone to memorizing training data, which poses serious privacy risks. Two of the most prominent concerns are training data extraction and Membership Inference Attacks (MIAs). Prior research has shown that…

Machine Learning · Computer Science 2026-03-02 Ali Al Sahili , Ali Chehab , Razane Tajeddine

Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Bargav Jayaraman , Chuan Guo , Kamalika Chaudhuri

Document Visual Question Answering (DocVQA) has quickly grown into a central task of document understanding. But despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong…

Large language models (LLMs) can store a vast amount of world knowledge, often extractable via question-answering (e.g., "What is Abraham Lincoln's birthday?"). However, do they answer such questions based on exposure to similar questions…

Computation and Language · Computer Science 2024-07-17 Zeyuan Allen-Zhu , Yuanzhi Li

Vision-language models (VLMs) excel at extracting and reasoning about information from images. Yet, their capacity to leverage internal knowledge about specific entities remains underexplored. This work investigates the disparity in model…

Computation and Language · Computer Science 2026-01-06 Ido Cohen , Daniela Gottesman , Mor Geva , Raja Giryes

Large Language Models (LLMs) memorize, and thus, among huge amounts of uncontrolled data, may memorize Personally Identifiable Information (PII), which should not be stored and, consequently, not leaked. In this paper, we introduce Private…

Cryptography and Security · Computer Science 2025-08-22 Elena Sofia Ruzzetti , Giancarlo A. Xompero , Davide Venditti , Fabio Massimo Zanzotto

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary…

Computation and Language · Computer Science 2026-05-26 Yihan Wang , N. Asokan

Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy as a static extraction task and ignore how a…

Artificial Intelligence · Computer Science 2026-01-12 G M Shahariar , Zabir Al Nazi , Md Olid Hasan Bhuiyan , Zhouxing Shi

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing and reasoning over diverse modalities, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally…

Cryptography and Security · Computer Science 2025-10-01 Boyang Zhang , Istemi Ekin Akkus , Ruichuan Chen , Alice Dethise , Klaus Satzke , Ivica Rimac , Yang Zhang

Vision-language models (VLMs) are increasingly adapted through domain-specific fine-tuning, yet it remains unclear whether this improves reasoning beyond superficial visual cues, particularly in high-stakes domains like medicine. We…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Oliver McLaughlin , Daniel Shubin , Carsten Eickhoff , Ritambhara Singh , William Rudman , Michal Golovanevsky

As large language models (LLMs) become ubiquitous in our daily tasks and digital interactions, associated privacy risks are increasingly in focus. While LLM privacy research has primarily focused on the leakage of model training data, it…

Artificial Intelligence · Computer Science 2024-11-05 Batuhan Tömekçe , Mark Vero , Robin Staab , Martin Vechev

Recent research on Vision Language Models (VLMs) suggests that they rely on inherent biases learned during training to respond to questions about visual properties of an image. These biases are exacerbated when VLMs are asked highly…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Saurav Sengupta , Nazanin Moradinasab , Jiebei Liu , Donald E. Brown

Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is…

Computation and Language · Computer Science 2024-11-27 Jiayi Kuang , Jingyou Xie , Haohao Luo , Ronghao Li , Zhe Xu , Xianfeng Cheng , Yinghui Li , Xika Lin , Ying Shen
‹ Prev 1 2 3 10 Next ›