Related papers: F-tree: an algorithm for clustering transactional …

Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA…

Machine Learning · Computer Science 2018-11-08 Minh Tuan Doan , Jianzhong Qi , Sutharshan Rajasegarar , Christopher Leckie

Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were…

Databases · Computer Science 2010-03-23 M. S. Danessh , C. Balasubramanian , K. Duraiswamy

Ultrametric Cluster Hierarchies: I Want 'em All!

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one…

Machine Learning · Computer Science 2025-11-13 Andrew Draganov , Pascal Weber , Rasmus Skibdahl Melanchton Jørgensen , Anna Beer , Claudia Plant , Ira Assent

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

Mining frequency-based sequential trajectory co-clusters

Co-clustering is a specific type of clustering that addresses the problem of finding groups of objects without necessarily considering all attributes. This technique has shown to have more consistent results in high-dimensional sparse data…

Machine Learning · Computer Science 2021-10-28 Yuri Santos , Jônata Tyska , Vania Bogorny

Isotropic Dynamic Hierarchical Clustering

We face a need of discovering a pattern in locations of a great number of points in a high-dimensional space. Goal is to group the close points together. We are interested in a hierarchical structure, like a B-tree. B-Trees are…

Data Structures and Algorithms · Computer Science 2016-07-19 Victor Sadikov , Oliver Rutishauser

An Improved UP-Growth High Utility Itemset Mining

Efficient discovery of frequent itemsets in large datasets is a crucial task of data mining. In recent years, several approaches have been proposed for generating high utility patterns, they arise the problems of producing a large number of…

Databases · Computer Science 2012-12-04 B. Adinarayana Reddy , O. Srinivasa Rao , M. H. M. Krishna Prasad

Tree Index: A New Cluster Evaluation Technique

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation…

Machine Learning · Computer Science 2020-03-25 A. H. Beg , Md Zahidul Islam , Vladimir Estivill-Castro

A Fast General-Purpose Clustering Algorithm Based on FPGAs for High-Throughput Data Processing

We present a fast general-purpose algorithm for high-throughput clustering of data "with a two dimensional organization". The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time…

Instrumentation and Detectors · Physics 2015-05-14 A. Annovi , M. Beretta

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

StruClus: Structural Clustering of Large-Scale Graph Databases

We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…

Databases · Computer Science 2016-10-03 Till Schäfer , Petra Mutzel

Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering algorithms are infeasible due to their high memory requirements and/or unfavorable runtime…

Data Structures and Algorithms · Computer Science 2026-01-27 Gregor Ulm , Simon Smith , Adrian Nilsson , Emil Gustavsson , Mats Jirstrand

A matching based clustering algorithm for categorical data

Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…

Machine Learning · Computer Science 2018-12-11 Ruben A. Gevorgyan , Yenok B. Hakobyan

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products. The key contribution of BIRCH is the Clustering Feature tree (CF-Tree), which is a compressed representation…

Machine Learning · Computer Science 2020-11-30 Andreas Lang , Erich Schubert

Random Indexing K-tree

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are…

Information Retrieval · Computer Science 2010-02-02 Christopher M. De Vries , Lance De Vine , Shlomo Geva

Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution

Clustering aims to group unlabeled objects based on similarity inherent among them into clusters. It is important for many tasks such as anomaly detection, database sharding, record linkage, and others. Some clustering methods are taken as…

Databases · Computer Science 2024-12-02 Binbin Gu , Saeed Kargar , Faisal Nawab

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Zihan Wu , Zhaoke Huang , Hong Yan

Clustering categorical data via ensembling dissimilarity matrices

We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other…

Machine Learning · Statistics 2017-09-20 Saeid Amiri , Bertrand Clarke , Jennifer Clarke

Iterative Optimization and Simplification of Hierarchical Clusterings

Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search…

Artificial Intelligence · Computer Science 2014-11-17 D. Fisher