English

Bounded Simplex-Structured Matrix Factorization: Algorithms, Identifiability and Applications

Machine Learning 2023-07-26 v2 Information Retrieval Numerical Analysis Signal Processing Numerical Analysis Machine Learning

Abstract

In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix XX and a factorization rank rr, BSSMF looks for a matrix WW with rr columns and a matrix HH with rr rows such that XWHX \approx WH where the entries in each column of WW are bounded, that is, they belong to given intervals, and the columns of HH belong to the probability simplex, that is, HH is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix XX belong to a given interval; for example when the rows of XX represent images, or XX is a rating matrix such as in the Netflix and MovieLens datasets where the entries of XX belong to the interval [1,5][1,5]. The simplex-structured matrix HH not only leads to an easily understandable decomposition providing a soft clustering of the columns of XX, but implies that the entries of each column of WHWH belong to the same intervals as the columns of WW. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in XX. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.

Keywords

Cite

@article{arxiv.2209.12638,
  title  = {Bounded Simplex-Structured Matrix Factorization: Algorithms, Identifiability and Applications},
  author = {Olivier Vu Thanh and Nicolas Gillis and Fabian Lecron},
  journal= {arXiv preprint arXiv:2209.12638},
  year   = {2023}
}

Comments

14 pages, new title, new numerical experiments on synthetic data, clarifications of several parts of the paper, run times added