Feature Selection with Distance Correlation

Ranit Das; Gregor Kasieczka; David Shih

Feature Selection with Distance Correlation

High Energy Physics - Phenomenology 2022-12-02 v1 Machine Learning High Energy Physics - Experiment Data Analysis, Statistics and Probability

Authors: Ranit Das , Gregor Kasieczka , David Shih

View on arXiv ↗ PDF ↗

Abstract

Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top- and $W$ -tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.

Keywords

track reconstruction in high energy physics neural networks in physics statistical algorithms

Cite

@article{arxiv.2212.00046,
  title  = {Feature Selection with Distance Correlation},
  author = {Ranit Das and Gregor Kasieczka and David Shih},
  journal= {arXiv preprint arXiv:2212.00046},
  year   = {2022}
}

Comments

14 pages, 8 figures, 3 tables

Feature Selection with Distance Correlation

Abstract

Keywords

Cite

Comments

Related papers