Related papers: High-Dimensional Data Clustering

Subspace Clustering with the Multivariate-t Distribution

Clustering procedures suitable for the analysis of very high-dimensional data are needed for many modern data sets. In model-based clustering, a method called high-dimensional data clustering (HDDC) uses a family of Gaussian mixture models…

Methodology · Statistics 2017-06-28 Angelina Pesevski , Brian C. Franczak , Paul D. McNicholas

Subspace clustering of high-dimensional data: a predictive approach

In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a…

Machine Learning · Statistics 2012-03-07 Brian McWilliams , Giovanni Montana

Hashing-Based Distributed Clustering for Massive High-Dimensional Data

Clustering analysis is of substantial significance for data mining. The properties of big data raise higher demand for more efficient and economical distributed clustering methods. However, existing distributed clustering methods mainly…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-03 Yifeng Xiao , Jiang Xue , Deyu Meng

High-dimensional cluster analysis with the Masked EM Algorithm

Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large…

Quantitative Methods · Quantitative Biology 2013-09-12 Shabnam N. Kadir , Dan F. M. Goodman , Kenneth D. Harris

Deep Continuous Clustering

Clustering high-dimensional datasets is hard because interpoint distances become less informative in high-dimensional spaces. We present a clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly. The…

Machine Learning · Computer Science 2018-03-06 Sohil Atul Shah , Vladlen Koltun

High Dimensional Cluster Analysis Using Path Lengths

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension ($N_{_{D}}>3$). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering…

Data Analysis, Statistics and Probability · Physics 2017-10-16 Kevin McIlhany , Stephen Wiggins

Subspace Clustering through Sub-Clusters

The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we…

Machine Learning · Statistics 2020-06-12 Weiwei Li , Jan Hannig , Sayan Mukherjee

Clustering high dimensional data using subspace and projected clustering algorithms

Problem statement: Clustering has a number of techniques that have been developed in statistics, pattern recognition, data mining, and other fields. Subspace clustering enumerates clusters of objects in all subspaces of a dataset. It tends…

Databases · Computer Science 2010-09-03 Rahmat Widia Sembiring , Jasni Mohamad Zain , Abdullah Embong

Robust Clustering using Hyperdimensional Computing

This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is…

Machine Learning · Computer Science 2024-04-19 Lulu Ge , Keshab K. Parhi

Clustering for high-dimension, low-sample size data using distance vectors

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that…

Machine Learning · Statistics 2013-12-30 Yoshikazu Terada

A H-K Clustering Algorithm For High Dimensional Data Using Ensemble Learning

Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority…

Databases · Computer Science 2015-01-13 Rashmi Paithankar , Bharat Tidke

Clustering based on Mixtures of Sparse Gaussian Processes

Creating low dimensional representations of a high dimensional data set is an important component in many machine learning applications. How to cluster data using their low dimensional embedded space is still a challenging problem in…

Machine Learning · Computer Science 2023-03-27 Zahra Moslehi , Abdolreza Mirzaei , Mehran Safayani

High-dimensional Clustering onto Hamiltonian Cycle

Clustering aims to group unlabelled samples based on their similarities. It has become a significant tool for the analysis of high-dimensional data. However, most of the clustering methods merely generate pseudo labels and thus are unable…

Artificial Intelligence · Computer Science 2023-06-21 Tianyi Huang , Shenghui Cheng , Stan Z. Li , Zhengjun Zhang

A new model for natural groupings in high-dimensional data

Clustering aims to divide a set of points into groups. The current paradigm assumes that the grouping is well-defined (unique) given the probability model from which the data is drawn. Yet, recent experiments have uncovered several…

Machine Learning · Statistics 2024-06-25 Mireille Boutin , Evzenie Coupkova

Subspace clustering of dimensionality-reduced data

Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, assumed unknown. In practice one may have access to dimensionality-reduced observations of the…

Information Theory · Computer Science 2014-04-29 Reinhard Heckel , Michael Tschannen , Helmut Bölcskei

Sparse Subspace Clustering: Algorithm, Theory, and Applications

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures…

Computer Vision and Pattern Recognition · Computer Science 2013-02-06 Ehsan Elhamifar , Rene Vidal

A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the…

Statistics Theory · Mathematics 2016-04-26 Tsvetan Asamov , Adi Ben-Israel

Orthogonal Subspace Clustering: Enhancing High-Dimensional Data Analysis through Adaptive Dimensionality Reduction and Efficient Clustering

This paper presents Orthogonal Subspace Clustering (OSC), an innovative method for high-dimensional data clustering. We first establish a theoretical theorem proving that high-dimensional data can be decomposed into orthogonal subspaces in…

Machine Learning · Computer Science 2026-03-17 Qing-Yuan Wen , Da-Qing Zhang

Hierarchical Sparse Representation Clustering for High-Dimensional Data Streams

Data stream clustering reveals patterns within continuously arriving, potentially unbounded data sequences. Numerous data stream algorithms have been proposed to cluster data streams. The existing data stream clustering algorithms still…

Machine Learning · Computer Science 2025-07-02 Jie Chen , Hua Mao , Yuanbiao Gou , Xi Peng

Clustering High-dimensional Data: Balancing Abstraction and Representation Tutorial at AAAI 2026

How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous details of individual objects. But we also need a rich…

Machine Learning · Computer Science 2026-01-19 Claudia Plant , Lena G. M. Bauer , Christian Böhm