Related papers: Scaling pattern mining through non-overlapping var…
Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…
Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has…
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering…
Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as…
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…
Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying…
Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and…
Biclustering is an unsupervised data mining technique that aims to unveil patterns (biclusters) from gene expression data matrices. In the framework of this thesis, we propose new biclustering algorithms for microarray data. The latter is…
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering…
VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task…
We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…
As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing…
Probabilistic clustering models (or equivalently, mixture models) are basic building blocks in countless statistical models and involve latent random variables over discrete spaces. For these models, posterior inference methods can be…
The paper tackles the problem of clustering multiple networks, directed or not, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of…
Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…
With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…
The rapid development of high-throughput sequencing technologies has led to an explosive increase in biological sequence data, making sequence clustering a fundamental task in large-scale bioinformatics analyses. Unlike traditional…
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…
Bi-clustering is a technique that allows for the simultaneous clustering of observations and features in a dataset. This technique is often used in bioinformatics, text mining, and time series analysis. An important advantage of…