Fair and Diverse DPP-based Data Summarization

L. Elisa Celis; Vijay Keswani; Damian Straszak; Amit Deshpande; Tarun Kathuria; Nisheeth K. Vishnoi

Fair and Diverse DPP-based Data Summarization

Machine Learning 2018-02-13 v1 Computers and Society Information Retrieval Machine Learning

Authors: L. Elisa Celis , Vijay Keswani , Damian Straszak , Amit Deshpande , Tarun Kathuria , Nisheeth K. Vishnoi

Abstract

Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias (under- or over-representation of a certain gender or race) in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Coming up with efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier and we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our experimental results on a real-world and an image dataset show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case, and we also provide a theoretical explanation of it.

Keywords

text summarization randomized algorithm fairness in machine learning

Cite

@article{arxiv.1802.04023,
  title  = {Fair and Diverse DPP-based Data Summarization},
  author = {L. Elisa Celis and Vijay Keswani and Damian Straszak and Amit Deshpande and Tarun Kathuria and Nisheeth K. Vishnoi},
  journal= {arXiv preprint arXiv:1802.04023},
  year   = {2018}
}

Comments

A short version of this paper appeared in the workshop FAT/ML 2016 - arXiv:1610.07183

Fair and Diverse DPP-based Data Summarization

Abstract

Keywords

Cite

Comments

Related papers