English

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Machine Learning 2020-03-03 v2 Computer Vision and Pattern Recognition Machine Learning

Abstract

Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.

Keywords

Cite

@article{arxiv.2002.11934,
  title  = {Supervised Dimensionality Reduction and Visualization using Centroid-encoder},
  author = {Tomojit Ghosh and Michael Kirby},
  journal= {arXiv preprint arXiv:2002.11934},
  year   = {2020}
}

Comments

25 pages (including 3 reference pages), 12 figures. I am planning to submit the paper to JMLR very soon. Centroid-encoder was applied on a biological pathway data (https://www.sciencedirect.com/science/article/pii/S1046202317300439). In this paper we throughly analyzed the algorithm and compared it with state-of-the art techniques on a 8 data sets including MNIST, USPS