English

Learning from aggregated data with a maximum entropy model

Machine Learning 2022-10-07 v1 Artificial Intelligence

Abstract

Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.

Keywords

Cite

@article{arxiv.2210.02450,
  title  = {Learning from aggregated data with a maximum entropy model},
  author = {Alexandre Gilotte and Ahmed Ben Yahmed and David Rohde},
  journal= {arXiv preprint arXiv:2210.02450},
  year   = {2022}
}