Related papers: Approximating Incomplete Kernel Matrices by the em…
We address the problem of filling missing entries in a kernel Gram matrix, given a related full Gram matrix. We attack this problem from the viewpoint of regression, assuming that the two kernel matrices can be considered as explanatory…
With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data…
Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. It works by introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$ that allow the joint…
Many scientific computing problems can be reduced to Matrix-Matrix Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance computing…
Multiple datasets containing different types of features may be available for a given task. For instance, users' profiles can be used to group users for recommendation systems. In addition, a model can also use users' historical behaviors…
Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering…
Measuring similarity between incomplete data is a fundamental challenge in web mining, recommendation systems, and user behavior analysis. Traditional approaches either discard incomplete data or perform imputation as a preprocessing step,…
By removing irrelevant and redundant features, feature selection aims to find a good representation of the original features. With the prevalence of unlabeled data, unsupervised feature selection has been proven effective in alleviating the…
Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. A common method for fitting finite mixture models is to employ spectral clustering, which…
Recently, a so-called E-MS algorithm was developed for model selection in the presence of missing data. Specifically, it performs the Expectation step (E step) and Model Selection step (MS step) alternately to find the minimum point of the…
Unlabeled shape analysis is a rapidly emerging and challenging area of statistics. This has been driven by various novel applications in bioinformatics. We consider here the situation where two configurations are matched under various…
Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this…
Missing responses is a missing data format in which outcomes are not always observed. In this work we develop kernel machines that can handle missing responses. First, we propose a kernel machine family that uses mainly the complete cases.…
Bayesian inference for exponential family random graph models (ERGMs) is a doubly-intractable problem because of the intractability of both the likelihood and posterior normalizing factor. Auxiliary variable based Markov Chain Monte Carlo…
Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input.…
The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference when there are missing measurements in the observed data or when the data is composed of…
We construct $\bf genRBF$ kernel, which generalizes the classical Gaussian RBF kernel to the case of incomplete data. We model the uncertainty contained in missing attributes making use of data distribution and associate every point with a…
We study an EM algorithm for estimating product-term regression models with missing data. The study of such problems in the likelihood tradition has thus far been restricted to an EM algorithm method using full numerical integration.…
We propose an efficient family of algorithms to learn the parameters of a Bayesian network from incomplete data. In contrast to textbook approaches such as EM and the gradient method, our approach is non-iterative, yields closed form…
Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest…