English

Statistical Industry Classification

Portfolio Management 2019-01-01 v4 Statistical Finance

Abstract

We give complete algorithms and source code for constructing (multilevel) statistical industry classifications, including methods for fixing the number of clusters at each level (and the number of levels). Under the hood there are clustering algorithms (e.g., k-means). However, what should we cluster? Correlations? Returns? The answer turns out to be neither and our backtests suggest that these details make a sizable difference. We also give an algorithm and source code for building "hybrid" industry classifications by improving off-the-shelf "fundamental" industry classifications by applying our statistical industry classification methods to them. The presentation is intended to be pedagogical and geared toward practical applications in quantitative trading.

Keywords

Cite

@article{arxiv.1607.04883,
  title  = {Statistical Industry Classification},
  author = {Zura Kakushadze and Willie Yu},
  journal= {arXiv preprint arXiv:1607.04883},
  year   = {2019}
}

Comments

44 pages; trivial misprints corrected

R2 v1 2026-06-22T14:56:44.217Z