Related papers: Mending the Big-Data Missing Information

Fusion Subspace Clustering: Full and Incomplete Data

Modern inference and learning often hinge on identifying low-dimensional structures that approximate large scale data. Subspace clustering achieves this through a union of linear subspaces. However, in contemporary applications data is…

Machine Learning · Computer Science 2018-08-03 Daniel L. Pimentel-Alarcón , Usman Mahmood

Fusion Subspace Clustering for Incomplete Data

This paper introduces {\em fusion subspace clustering}, a novel method to learn low-dimensional structures that approximate large scale yet highly incomplete data. The main idea is to assign each datum to a subspace of its own, and minimize…

Machine Learning · Computer Science 2022-05-24 Usman Mahmood , Daniel Pimentel-Alarcón

Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries

We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic…

Computer Vision and Pattern Recognition · Computer Science 2017-09-06 João Carvalho , Manuel Marques , João P. Costeira

Clustering small datasets in high-dimension by random projection

Datasets in high-dimension do not typically form clusters in their original space; the issue is worse when the number of points in the dataset is small. We propose a low-computation method to find statistically significant clustering…

Machine Learning · Statistics 2020-08-24 Alden Bradford , Tarun Yellamraju , Mireille Boutin

High-Dimensional Data Clustering

Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces…

Statistics Theory · Mathematics 2016-08-16 Charles Bouveyron , Stéphane Girard , Cordelia Schmid

EPTAS for $k$-means Clustering of Affine Subspaces

We consider a generalization of the fundamental $k$-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $\mathbb{R}^d$, a data point is said to be incomplete when some of its…

Data Structures and Algorithms · Computer Science 2020-10-20 Eduard Eiben , Fedor V. Fomin , Petr A. Golovach , William Lochet , Fahad Panolan , Kirill Simonov

Optimal Data Distribution for Big-Data All-to-All Comparison using Finite Projective and Affine Planes

An All-to-All Comparison problem is where every element of a data set is compared with every other element. This is analogous to projective planes and affine planes where every pair of points share a common line. For large data sets, the…

Combinatorics · Mathematics 2023-08-30 Joanne L. Hall , Wayne Kelly , Yu-Chu Tian

Leachable Component Clustering

Clustering attempts to partition data instances into several distinctive groups, while the similarities among data belonging to the common partition can be principally reserved. Furthermore, incomplete data frequently occurs in many…

Machine Learning · Computer Science 2022-08-30 Miao Cheng , Xinge You

Deep Continuous Clustering

Clustering high-dimensional datasets is hard because interpoint distances become less informative in high-dimensional spaces. We present a clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly. The…

Machine Learning · Computer Science 2018-03-06 Sohil Atul Shah , Vladlen Koltun

Subspace Clustering through Sub-Clusters

The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we…

Machine Learning · Statistics 2020-06-12 Weiwei Li , Jan Hannig , Sayan Mukherjee

Clustering of Data with Missing Entries

The analysis of large datasets is often complicated by the presence of missing entries, mainly because most of the current machine learning algorithms are designed to work with full data. The main focus of this work is to introduce a…

Machine Learning · Computer Science 2018-01-08 Sunrita Poddar , Mathews Jacob

Linear-Time Approximation Scheme for k-Means Clustering of Affine Subspaces

In this paper, we present a linear-time approximation scheme for $k$-means clustering of \emph{incomplete} data points in $d$-dimensional Euclidean space. An \emph{incomplete} data point with $\Delta>0$ unspecified entries is represented as…

Computational Geometry · Computer Science 2021-06-29 Kyungjin Cho , Eunjin Oh

A new model for natural groupings in high-dimensional data

Clustering aims to divide a set of points into groups. The current paradigm assumes that the grouping is well-defined (unique) given the probability model from which the data is drawn. Yet, recent experiments have uncovered several…

Machine Learning · Statistics 2024-06-25 Mireille Boutin , Evzenie Coupkova

An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty,…

Methodology · Statistics 2018-02-08 Faming Liang , Bochao Jia , Jingnan Xue , Qizhai Li , Ye Luo

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

Filtrated Algebraic Subspace Clustering

Subspace clustering is the problem of clustering data that lie close to a union of linear subspaces. In the abstract form of the problem, where no noise or other corruptions are present, the data are assumed to lie in general position…

Computer Vision and Pattern Recognition · Computer Science 2020-02-13 Manolis C. Tsakiris , Rene Vidal

Clustering High-dimensional Data: Balancing Abstraction and Representation Tutorial at AAAI 2026

How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous details of individual objects. But we also need a rich…

Machine Learning · Computer Science 2026-01-19 Claudia Plant , Lena G. M. Bauer , Christian Böhm

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes

With the rapid development of online social media, online shopping sites and cyber-physical systems, heterogeneous information networks have become increasingly popular and content-rich over time. In many cases, such networks contain…

Databases · Computer Science 2012-02-01 Yizhou Sun , Charu C. Aggarwal , Jiawei Han

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

High-Dimensional Matched Subspace Detection When Data are Missing

We consider the problem of deciding whether a highly incomplete signal lies within a given subspace. This problem, Matched Subspace Detection, is a classical, well-studied problem when the signal is completely observed. High- dimensional…

Information Theory · Computer Science 2011-01-25 Laura Balzano , Bejamin Recht , Robert Nowak