English

Topologically Regularized Data Embeddings

Machine Learning 2023-11-08 v2

Abstract

Unsupervised representation learning methods are widely used for gaining insight into high-dimensional, unstructured, or structured data. In some cases, users may have prior topological knowledge about the data, such as a known cluster structure or the fact that the data is known to lie along a tree- or graph-structured topology. However, generic methods to ensure such structure is salient in the low-dimensional representations are lacking. This negatively impacts the interpretability of low-dimensional embeddings, and plausibly downstream learning tasks. To address this issue, we introduce topological regularization: a generic approach based on algebraic topology to incorporate topological prior knowledge into low-dimensional embeddings. We introduce a class of topological loss functions, and show that jointly optimizing an embedding loss with such a topological loss function as a regularizer yields embeddings that reflect not only local proximities but also the desired topological structure. We include a self-contained overview of the required foundational concepts in algebraic topology, and provide intuitive guidance on how to design topological loss functions for a variety of shapes, such as clusters, cycles, and bifurcations. We empirically evaluate the proposed approach on computational efficiency, robustness, and versatility in combination with linear and non-linear dimensionality reduction and graph embedding methods.

Keywords

Cite

@article{arxiv.2301.03338,
  title  = {Topologically Regularized Data Embeddings},
  author = {Edith Heiter and Robin Vandaele and Tijl De Bie and Yvan Saeys and Jefrey Lijffijt},
  journal= {arXiv preprint arXiv:2301.03338},
  year   = {2023}
}

Comments

52 pages, preprint, under review