Large-Scale Differentially Private BERT

Rohan Anil; Badih Ghazi; Vineet Gupta; Ravi Kumar; Pasin Manurangsi

Large-Scale Differentially Private BERT

Machine Learning 2021-08-04 v1 Computation and Language Cryptography and Security

Authors: Rohan Anil , Badih Ghazi , Vineet Gupta , Ravi Kumar , Pasin Manurangsi

Abstract

In this work, we study the large-scale pretraining of BERT-Large with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance its efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of [SVK20], who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX [BFH+18, FJL18] primitives in conjunction with the XLA compiler [XLA17]. Our implementation achieves a masked language model accuracy of 60.5% at a batch size of 2M, for $\epsilon = 5.36$ . To put this number in perspective, non-private BERT models achieve an accuracy of $\sim$ 70%.

Cite

@article{arxiv.2108.01624,
  title  = {Large-Scale Differentially Private BERT},
  author = {Rohan Anil and Badih Ghazi and Vineet Gupta and Ravi Kumar and Pasin Manurangsi},
  journal= {arXiv preprint arXiv:2108.01624},
  year   = {2021}
}

Comments

12 pages, 6 figures

Large-Scale Differentially Private BERT

Abstract

Cite

Comments

Related papers