Related papers: A parallel sampling based clustering
With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…
The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
We introduce and address a novel distributed clustering problem where each participant has a private dataset containing only a subset of all available features, and some features are included in multiple datasets. This scenario occurs in…
Graph clustering has many important applications in computing, but due to growing sizes of graphs, even traditionally fast clustering methods such as spectral partitioning can be computationally expensive for real-world graphs of interest.…
We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…
As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…
In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization…
Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…
Mining Time Series data has a tremendous growth of interest in today's world. To provide an indication various implementations are studied and summarized to identify the different problems in existing applications. Clustering time series is…
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…
Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…
Subspace clustering, the task of clustering high dimensional data when the data points come from a union of subspaces is one of the fundamental tasks in unsupervised machine learning. Most of the existing algorithms for this task require…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects…
We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of…
We introduce a simple, intuitive and yet powerful algorithm for clustering analysis. This algorithm is an iterative process on the sample space, which arises as an extension of the iteratively generated correlation matrices. It allows for…
We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the…