Related papers: MAPLE: Metadata Augmented Private Language Evoluti…

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to…

Computation and Language · Computer Science 2024-07-25 Chulin Xie , Zinan Lin , Arturs Backurs , Sivakanth Gopi , Da Yu , Huseyin A Inan , Harsha Nori , Haotian Jiang , Huishuai Zhang , Yin Tat Lee , Bo Li , Sergey Yekhanin

Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?

Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data. Recent advancements in large language models (LLMs) have inspired a number of algorithm techniques for improving DP synthetic data…

Machine Learning · Computer Science 2025-02-11 Marika Swanberg , Ryan McKenna , Edo Roth , Albert Cheu , Peter Kairouz

Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently,…

Machine Learning · Computer Science 2025-05-21 Zinan Lin , Tadas Baltrusaitis , Wenyu Wang , Sergey Yekhanin

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Generating differentially private (DP) synthetic data that closely resembles the original private data is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Zinan Lin , Sivakanth Gopi , Janardhan Kulkarni , Harsha Nori , Sergey Yekhanin

Private Evolution Converges

Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data)…

Machine Learning · Computer Science 2025-11-18 Tomás González , Giulia Fanti , Aaditya Ramdas

DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators

Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy…

Machine Learning · Computer Science 2025-04-30 Tejumade Afonja , Hui-Po Wang , Raouf Kerkouche , Mario Fritz

Secret-Protected Evolution for Differentially Private Synthetic Text Generation

Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns.…

Cryptography and Security · Computer Science 2025-10-14 Tianze Wang , Zhaoyu Chen , Jian Du , Yingtai Xiao , Linjun Zhang , Qiang Yan

Differentially Private Training of Mixture of Experts Models

This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of…

Cryptography and Security · Computer Science 2024-02-13 Pierre Tholoniat , Huseyin A. Inan , Janardhan Kulkarni , Robert Sim

MAPLE: A Sub-Agent Architecture for Memory, Learning, and Personalization in Agentic AI Systems

Large language model (LLM) agents have emerged as powerful tools for complex tasks, yet their ability to adapt to individual users remains fundamentally limited. We argue this limitation stems from a critical architectural conflation:…

Artificial Intelligence · Computer Science 2026-02-17 Deepak Babu Piskala

Individualized PATE: Differentially Private Machine Learning with Individual Privacy Guarantees

Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at…

Machine Learning · Computer Science 2022-11-09 Franziska Boenisch , Christopher Mühl , Roy Rinberg , Jannis Ihrig , Adam Dziedzic

PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

The rise of generative APIs has fueled interest in privacy-preserving synthetic data generation. While the Private Evolution (PE) algorithm generates Differential Privacy (DP) synthetic images using diffusion model APIs, it struggles with…

Cryptography and Security · Computer Science 2025-06-09 Jianqing Zhang , Yang Liu , Jie Fu , Yang Hua , Tianyuan Zou , Jian Cao , Qiang Yang

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains

The difficulty of anonymizing text data hinders the development and deployment of NLP in high-stakes domains that involve private data, such as healthcare and social services. Poorly anonymized sensitive data cannot be easily shared with…

Computation and Language · Computer Science 2024-10-14 Krithika Ramesh , Nupoor Gandhi , Pulkit Madaan , Lisa Bauer , Charith Peris , Anjalie Field

Efficient and Private: Memorisation under differentially private parameter-efficient fine-tuning in language models

Fine-tuning large language models (LLMs) for specific tasks introduces privacy risks, as models may inadvertently memorise and leak sensitive training data. While Differential Privacy (DP) offers a solution to mitigate these risks, it…

Machine Learning · Computer Science 2024-11-26 Olivia Ma , Jonathan Passerat-Palmbach , Dmitrii Usynin

DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

Differentially private (DP) synthetic data generation plays a pivotal role in developing large language models (LLMs) on private data, where data owners cannot provide eyes-on access to individual examples. Generating DP synthetic data…

Computation and Language · Computer Science 2026-02-24 Fangyuan Xu , Sihao Chen , Zinan Lin , Taiwei Shi , Sydney Graham , Pei Zhou , Mengting Wan , Alex Stein , Virginia Estellers , Charles Chen , Morris Sharp , Richard Speyer , Tadas Baltrusaitis , Jennifer Neville , Eunsol Choi , Longqi Yang

PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs

On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large…

Machine Learning · Computer Science 2024-10-21 Charlie Hou , Akshat Shrivastava , Hongyuan Zhan , Rylan Conway , Trang Le , Adithya Sagar , Giulia Fanti , Daniel Lazar

MAPLE: A Framework for Active Preference Learning Guided by Large Language Models

The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of…

Machine Learning · Computer Science 2024-12-23 Saaduddin Mahmud , Mason Nakamura , Shlomo Zilberstein

Enhancing Data Privacy in Large Language Models through Private Association Editing

Large language models (LLMs) require a significant redesign in solutions to preserve privacy in data-intensive applications due to their text-generation capabilities. Indeed, LLMs tend to memorize and emit private information when…

Computation and Language · Computer Science 2024-10-17 Davide Venditti , Elena Sofia Ruzzetti , Giancarlo A. Xompero , Cristina Giannone , Andrea Favalli , Raniero Romagnoli , Fabio Massimo Zanzotto

A Novel Self-Evolution Framework for Large Language Models

The capabilities of Large Language Models (LLMs) are limited to some extent by pre-training, so some researchers optimize LLMs through post-training. Existing post-training strategies, such as memory-based retrieval or preference…

Computation and Language · Computer Science 2025-07-22 Haoran Sun , Zekun Zhang , Shaoning Zeng

Differentially Private Tabular Data Synthesis using Large Language Models

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models

Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal…

Machine Learning · Computer Science 2026-05-19 Haichao Sha , Zihao Wang , Yuncheng Wu , Hong Chen , Wei Dong