English

Agglomerative Likelihood Clustering

Computational Finance 2022-03-22 v4 Data Analysis, Statistics and Probability Machine Learning

Abstract

We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.

Cite

@article{arxiv.1908.00951,
  title  = {Agglomerative Likelihood Clustering},
  author = {Lionel Yelibi and Tim Gebbie},
  journal= {arXiv preprint arXiv:1908.00951},
  year   = {2022}
}

Comments

15 pages, 8 figures