Related papers: Robust Bayesian Tensor Factorization with Zero-Inf…
Dimension reduction of high-dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count-valued and…
We propose a unified probabilistic framework for sparse count tensors with excess zeros, motivated by single-cell Hi-C data. The observed data are naturally represented as a three-way tensor indexed by genomic loci pairs and cells,…
Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients' history of…
In this paper, I propose a new class of Zero-Inflated Poisson models into the family of Cluster Weighted Models (CWMs) called Zero-Inflated Poisson CWMs (ZIPCWM). ZIPCWM extends Poisson cluster weighted models and other mixture models. I…
How can we capture the hidden properties from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is a major tool to extract latent factors from a tensor and matrices…
We present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model.…
We present a scalable Bayesian model for low-rank factorization of massive tensors with binary observations. The proposed model has the following key properties: (1) in contrast to the models based on the logistic or probit likelihood,…
Tensor decomposition is a popular technique for tensor completion, However most of the existing methods are based on linear or shallow model, when the data tensor becomes large and the observation data is very small, it is prone to over…
Probabilistic Temporal Tensor Factorization (PTTF) is an effective algorithm to model the temporal tensor data. It leverages a time constraint to capture the evolving properties of tensor data. Nowadays the exploding dataset demands a large…
The rapid generation of complex, highly skewed, and zero-inflated multi-source count data poses significant challenges for variable selection, particularly in biomedical domains like tumor development and metabolic dysregulation. To address…
In this manuscript, we introduce a tensor-based approach to Non-Negative Tensor Factorization (NTF). The method entails tensor dimension reduction through the utilization of the Einstein product. To maintain the regularity and sparsity of…
Coupled decompositions are a widely used tool for data fusion. As the volume of data increases, so does the dimensionality of matrices and tensors, highlighting the need for more efficient coupled decomposition algorithms. This paper…
Zero-inflated count data arise in various fields, including health, biology, economics, and the social sciences. These data are often modelled using probabilistic distributions such as zero-inflated Poisson (ZIP), zero-inflated negative…
We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor…
The Poisson distribution is often used as a standard model for count data. Quite often, however, such data sets are not well fit by a Poisson model because they have more zeros than are compatible with this model. For these situations, a…
Low-rank tensor completion has been widely used in computer vision and machine learning. This paper develops a novel multi-modal core tensor factorization (MCTF) method combined with a tensor low-rankness measure and a better nonconvex…
Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population…
Because of the limitations of matrix factorization, such as losing spatial structure information, the concept of low-rank tensor factorization (LRTF) has been applied for the recovery of a low dimensional subspace from high dimensional…
Classification of multi-dimensional time series from real-world systems require fine-grained learning of complex features such as cross-dimensional dependencies and intra-class variations-all under the practical challenge of low training…
Understanding the association between dietary patterns and health outcomes, such as the cancer risk, is crucial to inform public health guidelines and shaping future dietary interventions. However, dietary intake data present several…