English

Parity Queries for Binary Classification

Information Theory 2019-11-11 v2 Human-Computer Interaction Machine Learning math.IT

Abstract

Consider a query-based data acquisition problem that aims to recover the values of kk binary variables from parity (XOR) measurements of chosen subsets of the variables. Assume the response model where only a randomly selected subset of the measurements is received. We propose a method for designing a sequence of queries so that the variables can be identified with high probability using as few (nn) measurements as possible. We define the query difficulty dˉ\bar{d} as the average size of the query subsets and the sample complexity nn as the minimum number of measurements required to attain a given recovery accuracy. We obtain fundamental trade-offs between recovery accuracy, query difficulty, and sample complexity. In particular, the necessary and sufficient sample complexity required for recovering all kk variables with high probability is n=c0max{k,(klogk)/dˉ}n = c_0 \max\{k, (k \log k)/\bar{d}\} and the sample complexity for recovering a fixed proportion (1δ)k(1-\delta)k of the variables for δ=o(1)\delta=o(1) is n=c1max{k,(klog(1/δ))/dˉ}n = c_1\max\{k, (k \log(1/\delta))/\bar{d}\}, where c0,c1>0c_0, c_1>0.

Keywords

Cite

@article{arxiv.1809.00901,
  title  = {Parity Queries for Binary Classification},
  author = {Hye Won Chung and Ji Oon Lee and Doyeon Kim and Alfred O. Hero},
  journal= {arXiv preprint arXiv:1809.00901},
  year   = {2019}
}

Comments

26 pages, 4 figures