Related papers: Random Indexing K-tree
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse…
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document…
The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…
Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks…
Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research, we study the problem of categorical data clustering for…
Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…
We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…
Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority…
The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center…
k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of Lloyd's algorithm costs $O(kN)$ time. Although there are existing techniques to accelerate single Lloyd iterations, none of…
Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…
Partitioning trees are efficient data structures for $k$-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called $k$d-trees to perform $k$-nn search. Unfortunately, $k$d-trees can be…
With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…
The emerging class of instance-optimized systems has shown potential to achieve high performance by specializing to a specific data and query workloads. Particularly, Machine Learning (ML) techniques have been applied successfully to build…
Efficient index structures for fast approximate nearest neighbor queries are required in many applications such as recommendation systems. In high-dimensional spaces, many conventional methods suffer from excessive usage of memory and slow…
Cluster-based information retrieval is one of the Information retrieval(IR) tools that organize, extract features and categorize the web documents according to their similarity. Unlike traditional approaches, cluster-based IR is fast in…
In this paper, we revisit the problem of indexing multi-dimensional data in memory for the efficient support of multi-dimensional range queries and nearest neighbor queries. This is a classic problem in main-memory databases, where there is…
Reduced k-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously…
In this paper Knockout Refinement Algorithm (KRA) is proposed to refine original clusters obtained by applying SOM and K-Means clustering algorithms. KRA Algorithm is based on Contingency Table concepts. Metrics are computed for the…