English
Related papers

Related papers: PrivCode: When Code Generation Meets Differential …

200 papers

Large language models specialized for code (CodeLLMs) have demonstrated remarkable capabilities in generating code snippets, documentation, and test cases. However, despite their promising capabilities, CodeLLMs can inadvertently memorize…

Software Engineering · Computer Science 2025-12-15 Melih Catal , Pooja Rani , Harald C. Gall

Machine learning (ML) models frequently rely on training data that may include sensitive or personal information, raising substantial privacy concerns. Legislative frameworks such as the General Data Protection Regulation (GDPR) and the…

Machine Learning · Computer Science 2024-12-31 Md Mahadi Hasan Nahid , Sadid Bin Hasan

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable,…

Computation and Language · Computer Science 2024-05-24 Aldo Gael Carranza , Rezsa Farahani , Natalia Ponomareva , Alex Kurakin , Matthew Jagielski , Milad Nasr

Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy…

Machine Learning · Computer Science 2025-04-30 Tejumade Afonja , Hui-Po Wang , Raouf Kerkouche , Mario Fritz

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information…

Machine Learning · Computer Science 2025-11-20 Bishnu Bhusal , Manoj Acharya , Ramneet Kaur , Colin Samplawski , Anirban Roy , Adam D. Cobb , Rohit Chadha , Susmit Jha

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential…

Machine Learning · Computer Science 2024-10-10 Kareem Amin , Alex Bie , Weiwei Kong , Alexey Kurakin , Natalia Ponomareva , Umar Syed , Andreas Terzis , Sergei Vassilvitskii

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy…

Computation and Language · Computer Science 2024-08-19 Lynn Chua , Badih Ghazi , Yangsibo Huang , Pritish Kamath , Ravi Kumar , Daogao Liu , Pasin Manurangsi , Amer Sinha , Chiyuan Zhang

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Kecen Li , Chen Gong , Zhixiang Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly…

Fine-tuning large language models (LLMs) has become an essential strategy for adapting them to specialized tasks; however, this process introduces significant privacy challenges, as sensitive training data may be inadvertently memorized and…

Cryptography and Security · Computer Science 2025-05-02 Hao Du , Shang Liu , Yang Cao

Synthetic text generation with Differential Privacy (DP) guarantees emerges as a principled approach that can enable the sharing of sensitive datasets across institutional and regulatory boundaries, while bounding the risks of…

Privacy Preserving Synthetic Data Generation (PP-SDG) has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy (DP) is the property of a PP-SDG mechanism that establishes how…

Cryptography and Security · Computer Science 2025-07-23 Frederik Marinus Trudslev , Matteo Lissandrini , Juan Manuel Rodriguez , Martin Bøgsted , Daniele Dell'Aglio

To protect the privacy of individuals whose data is being shared, it is of high importance to develop methods allowing researchers and companies to release textual data while providing formal privacy guarantees to its originators. In the…

Machine Learning · Computer Science 2022-10-27 Justus Mattern , Zhijing Jin , Benjamin Weggenmann , Bernhard Schoelkopf , Mrinmaya Sachan

Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal…

Machine Learning · Computer Science 2026-05-19 Haichao Sha , Zihao Wang , Yuncheng Wu , Hong Chen , Wei Dong

Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression.…

Machine Learning · Computer Science 2026-04-07 Fatemeh Khadem , Sajad Mousavi , Yi Fang , Yuhong Liu

Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but…

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then…

Cryptography and Security · Computer Science 2021-09-16 Yuxin Wang , Zeyu Ding , Yingtai Xiao , Daniel Kifer , Danfeng Zhang

Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is…

Machine Learning · Computer Science 2024-04-24 Tânia Carvalho , Nuno Moniz , Luís Antunes , Nitesh Chawla
‹ Prev 1 2 3 10 Next ›