Related papers: Matrix Tile Analysis
Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal,…
The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…
In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of…
Partitioning large matrices is an important problem in distributed linear algebra computing (used in ML among others). Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner (whenever possible) on…
Matrix completion is a class of machine learning methods that concerns the prediction of missing entries in a partially observed matrix. This paper studies matrix completion for mixed data, i.e., data involving mixed types of variables…
Matrix factorization techniques have been widely used as a method for collaborative filtering for recommender systems. In recent times, different variants of deep learning algorithms have been explored in this setting to improve the task of…
Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA…
Mining and exploring databases should provide users with knowledge and new insights. Tiles of data strive to unveil true underlying structure and distinguish valuable information from various kinds of noise. We propose a novel Boolean…
This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum. This has been well understood for convex optimization, but…
Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical…
Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new unsupervised data integration…
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…
Biclustering numerical data became a popular data-mining task in the beginning of 2000's, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a…
Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed. A standard solution is matrix factorization,…
Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that…
Boolean-type algebra (BTA) is investigated. A BTA is decomposed into Boolean-type lattice (BTL) and a complementation algebra (CA). When the object set is finite, the matrix expressions of BTL and CA (and then BTA) are presented. The…
Consecutive matrix multiplications are commonly used in graph neural networks and sparse linear solvers. These operations frequently access the same matrices for both reading and writing. While reusing these matrices improves data locality,…
In this study, a new $\Delta$-evaluation method is introduced for solving a column permutation problem defined on a sparse binary matrix with the consecutive ones property. This problem models various $\mathcal{NP}$-hard problems in graph…
Machine learning (ML) methods have proved to be a very successful tool in physical sciences, especially when applied to experimental data analysis. Artificial intelligence is particularly good at recognizing patterns in high dimensional…
The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization.…