Related papers: Multi-objective Semi-supervised Clustering for Fin…
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no…
We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking…
Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to…
The immense amount of time series data produced by astronomical surveys has called for the use of machine learning algorithms to discover and classify several million celestial sources. In the case of variable stars, supervised learning…
We investigate the parameter estimation of regression models with fixed group effects, when the group variable is missing while group related variables are available. This problem involves clustering to infer the missing group variable…
This study introduces a general semiparametric clusterwise index distribution model to analyze how latent clusters affect the covariate-response relationships. By employing sufficient dimension reduction to account for the effects of…
Clustering consists of partitioning data objects into subsets called clusters according to some similarity criteria. This paper addresses a generalization called quasi-clustering that allows overlapping of clusters, and which we link to…
Clustering ensemble is one of the most recent advances in unsupervised learning. It aims to combine the clustering results obtained using different algorithms or from different runs of the same clustering algorithm for the same data set,…
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…
Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be…
We consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting.…
We introduce a semi-supervised discrete choice model to calibrate discrete choice models when relatively few requests have both choice sets and stated preferences but the majority only have the choice sets. Two classic semi-supervised…
We analyze the clustering problem through a flexible probabilistic model that aims to identify an optimal partition on the sample X 1 , ..., X n. We perform exact clustering with high probability using a convex semidefinite estimator that…
Clustering and prediction are two primary tasks in the fields of unsupervised and supervised learning, respectively. Although much of the recent advances in machine learning have been centered around those two tasks, the interdependent,…
In structured output learning, obtaining labelled data for real-world applications is usually costly, while unlabelled examples are available in abundance. Semi-supervised structured classification has been developed to handle large amounts…
The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily…
The input of most clustering algorithms is a symmetric matrix quantifying similarity within data pairs. Such a matrix is here turned into a quadratic set function measuring cluster score or similarity within data subsets larger than pairs.…
In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population…
Clustering is a fundamental task in unsupervised learning. The focus of this paper is the Correlation Clustering functional which combines positive and negative affinities between the data points. The contribution of this paper is two fold:…