English

Deducing Truth from Correlation

Information Theory 2018-05-15 v5 math.IT

Abstract

This work is motivated by a question at the heart of unsupervised learning approaches: Assume we are collecting a number K of (subjective) opinions about some event E from K different agents. Can we infer E from them? Prima facie this seems impossible, since the agents may be lying. We model this task by letting the events be distributed according to some distribution p and the task is to estimate p under unknown noise. Again, this is impossible without additional assumptions. We report here the finding of very natural such assumptions - the availability of multiple copies of the true data, each under independent and invertible (in the sense of matrices) noise, is already sufficient: If the true distribution and the observations are modeled on the same finite alphabet, then the number of such copies needed to determine p to the highest possible precision is exactly three! This result can be seen as a counterpart to independent component analysis. Therefore, we call our approach 'dependent component analysis'. In addition, we present generalizations of the model to different alphabet sizes at in- and output. A second result is found: the 'activation' of invertibility through multiple parallel uses.

Keywords

Cite

@article{arxiv.1412.5831,
  title  = {Deducing Truth from Correlation},
  author = {Janis Nötzel and Walter Swetly},
  journal= {arXiv preprint arXiv:1412.5831},
  year   = {2018}
}

Comments

13 pages, 3 figures. Published in IEEE Transactions on Information Theory. Typo in Theorem 4 and Conjecture 1 corrected in this version

R2 v1 2026-06-22T07:36:46.405Z