English
Related papers

Related papers: MAPLE: Metadata Augmented Private Language Evoluti…

200 papers

Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to…

Computation and Language · Computer Science 2024-07-25 Chulin Xie , Zinan Lin , Arturs Backurs , Sivakanth Gopi , Da Yu , Huseyin A Inan , Harsha Nori , Haotian Jiang , Huishuai Zhang , Yin Tat Lee , Bo Li , Sergey Yekhanin

Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data. Recent advancements in large language models (LLMs) have inspired a number of algorithm techniques for improving DP synthetic data…

Machine Learning · Computer Science 2025-02-11 Marika Swanberg , Ryan McKenna , Edo Roth , Albert Cheu , Peter Kairouz

Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently,…

Machine Learning · Computer Science 2025-05-21 Zinan Lin , Tadas Baltrusaitis , Wenyu Wang , Sergey Yekhanin

Generating differentially private (DP) synthetic data that closely resembles the original private data is a scalable way to mitigate privacy concerns in the current data-driven world. In contrast to current practices that train customized…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Zinan Lin , Sivakanth Gopi , Janardhan Kulkarni , Harsha Nori , Sergey Yekhanin

Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data)…

Machine Learning · Computer Science 2025-11-18 Tomás González , Giulia Fanti , Aaditya Ramdas

Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy…

Machine Learning · Computer Science 2025-04-30 Tejumade Afonja , Hui-Po Wang , Raouf Kerkouche , Mario Fritz

Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns.…

Cryptography and Security · Computer Science 2025-10-14 Tianze Wang , Zhaoyu Chen , Jian Du , Yingtai Xiao , Linjun Zhang , Qiang Yan

This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of…

Cryptography and Security · Computer Science 2024-02-13 Pierre Tholoniat , Huseyin A. Inan , Janardhan Kulkarni , Robert Sim

Large language model (LLM) agents have emerged as powerful tools for complex tasks, yet their ability to adapt to individual users remains fundamentally limited. We argue this limitation stems from a critical architectural conflation:…

Artificial Intelligence · Computer Science 2026-02-17 Deepak Babu Piskala

Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at…

Machine Learning · Computer Science 2022-11-09 Franziska Boenisch , Christopher Mühl , Roy Rinberg , Jannis Ihrig , Adam Dziedzic

The rise of generative APIs has fueled interest in privacy-preserving synthetic data generation. While the Private Evolution (PE) algorithm generates Differential Privacy (DP) synthetic images using diffusion model APIs, it struggles with…

Cryptography and Security · Computer Science 2025-06-09 Jianqing Zhang , Yang Liu , Jie Fu , Yang Hua , Tianyuan Zou , Jian Cao , Qiang Yang

The difficulty of anonymizing text data hinders the development and deployment of NLP in high-stakes domains that involve private data, such as healthcare and social services. Poorly anonymized sensitive data cannot be easily shared with…

Computation and Language · Computer Science 2024-10-14 Krithika Ramesh , Nupoor Gandhi , Pulkit Madaan , Lisa Bauer , Charith Peris , Anjalie Field

Fine-tuning large language models (LLMs) for specific tasks introduces privacy risks, as models may inadvertently memorise and leak sensitive training data. While Differential Privacy (DP) offers a solution to mitigate these risks, it…

Machine Learning · Computer Science 2024-11-26 Olivia Ma , Jonathan Passerat-Palmbach , Dmitrii Usynin

Differentially private (DP) synthetic data generation plays a pivotal role in developing large language models (LLMs) on private data, where data owners cannot provide eyes-on access to individual examples. Generating DP synthetic data…

On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large…

Machine Learning · Computer Science 2024-10-21 Charlie Hou , Akshat Shrivastava , Hongyuan Zhan , Rylan Conway , Trang Le , Adithya Sagar , Giulia Fanti , Daniel Lazar

The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of…

Machine Learning · Computer Science 2024-12-23 Saaduddin Mahmud , Mason Nakamura , Shlomo Zilberstein

Large language models (LLMs) require a significant redesign in solutions to preserve privacy in data-intensive applications due to their text-generation capabilities. Indeed, LLMs tend to memorize and emit private information when…

The capabilities of Large Language Models (LLMs) are limited to some extent by pre-training, so some researchers optimize LLMs through post-training. Existing post-training strategies, such as memory-based retrieval or preference…

Computation and Language · Computer Science 2025-07-22 Haoran Sun , Zekun Zhang , Shaoning Zeng

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal…

Machine Learning · Computer Science 2026-05-19 Haichao Sha , Zihao Wang , Yuncheng Wu , Hong Chen , Wei Dong
‹ Prev 1 2 3 10 Next ›