English

An algorithm for quantifying dependence in multivariate data sets

Data Analysis, Statistics and Probability 2012-11-06 v2 High Energy Physics - Experiment

Abstract

We describe an algorithm to quantify dependence in a multivariate data set. The algorithm is able to identify any linear and non-linear dependence in the data set by performing a hypothesis test for two variables being independent. As a result we obtain a reliable measure of dependence. In high energy physics understanding dependencies is especially important in multidimensional maximum likelihood analyses. We therefore describe the problem of a multidimensional maximum likelihood analysis applied on a multivariate data set with variables that are dependent on each other. We review common procedures used in high energy physics and show that general dependence is not the same as linear correlation and discuss their limitations in practical application. Finally we present the tool CAT, which is able to perform all reviewed methods in a fully automatic mode and creates an analysis report document with numeric results and visual review.

Keywords

Cite

@article{arxiv.1207.0981,
  title  = {An algorithm for quantifying dependence in multivariate data sets},
  author = {Michael Feindt and Michael Prim},
  journal= {arXiv preprint arXiv:1207.0981},
  year   = {2012}
}

Comments

4 pages, 3 figures

R2 v1 2026-06-21T21:30:25.018Z