Related papers: Affinity Clustering Framework for Data Debiasing U…

Factor Adjusted Spectral Clustering for Mixture Models

This paper studies a factor modeling-based approach for clustering high-dimensional data generated from a mixture of strongly correlated variables. Statistical modeling with correlated structures pervades modern applications in economics,…

Statistics Theory · Mathematics 2024-08-23 Shange Tang , Soham Jana , Jianqing Fan

Face Clustering: Representation and Pairwise Constraints

Clustering face images according to their identity has two important applications: (i) grouping a collection of face images when no external labels are associated with images, and (ii) indexing for efficient large scale face retrieval. The…

Computer Vision and Pattern Recognition · Computer Science 2018-07-30 Yichun Shi , Charles Otto , Anil K. Jain

Joint Debiased Representation Learning and Imbalanced Data Clustering

One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Mina Rezaei , Emilio Dorigatti , David Ruegamer , Bernd Bischl

Data Augmentation via Subgroup Mixup for Improving Fairness

In this work, we propose data augmentation via pairwise mixup across subgroups to improve group fairness. Many real-world applications of machine learning systems exhibit biases across certain groups due to under-representation or training…

Machine Learning · Statistics 2023-09-14 Madeline Navarro , Camille Little , Genevera I. Allen , Santiago Segarra

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity…

Machine Learning · Computer Science 2020-06-09 Xiang Li , Ben Kao , Caihua Shan , Dawei Yin , Martin Ester

Discriminative Clustering with Representation Learning with any Ratio of Labeled to Unlabeled Data

We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to…

Machine Learning · Statistics 2023-02-21 Corinne Jones , Vincent Roulet , Zaid Harchaoui

Doubly Stochastic Subspace Clustering

Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity. Most of the research into these methods focuses…

Machine Learning · Computer Science 2021-04-21 Derek Lim , René Vidal , Benjamin D. Haeffele

Clustering-friendly Representation Learning via Instance Discrimination and Feature Decorrelation

Clustering is one of the most fundamental tasks in machine learning. Recently, deep clustering has become a major trend in clustering techniques. Representation learning often plays an important role in the effectiveness of deep clustering,…

Machine Learning · Computer Science 2021-06-02 Yaling Tao , Kentaro Takagi , Kouta Nakata

Towards Fair Representation: Clustering and Consensus

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best…

Machine Learning · Computer Science 2025-06-18 Diptarka Chakraborty , Kushagra Chatterjee , Debarati Das , Tien Long Nguyen , Romina Nobahari

Inv-SENnet: Invariant Self Expression Network for clustering under biased data

Subspace clustering algorithms are used for understanding the cluster structure that explains the dataset well. These methods are extensively used for data-exploration tasks in various areas of Natural Sciences. However, most of these…

Machine Learning · Computer Science 2022-11-15 Ashutosh Singh , Ashish Singh , Aria Masoomi , Tales Imbiriba , Erik Learned-Miller , Deniz Erdogmus

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

Hierarchical clustering is a stronger extension of one of today's most influential unsupervised learning methods: clustering. The goal of this method is to create a hierarchy of clusters, thus constructing cluster evolutionary history and…

Data Structures and Algorithms · Computer Science 2021-01-14 MohammadTaghi Hajiaghayi , Marina Knittel

AugDMC: Data Augmentation Guided Deep Multiple Clustering

Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Jiawei Yao , Enbei Liu , Maham Rashid , Juhua Hu

Unsupervised Learning of Debiased Representations with Pseudo-Attributes

Dataset bias is a critical challenge in machine learning since it often leads to a negative impact on a model due to the unintended decision rules captured by spurious correlations. Although existing works often handle this issue based on…

Machine Learning · Computer Science 2022-04-05 Seonguk Seo , Joon-Young Lee , Bohyung Han

Learning Debiased Representation via Disentangled Feature Augmentation

Image classification models tend to make decisions based on peripheral attributes of data items that have strong correlation with a target variable (i.e., dataset bias). These biased models suffer from the poor generalization capability…

Machine Learning · Computer Science 2021-10-26 Jungsoo Lee , Eungyeup Kim , Juyoung Lee , Jihyeon Lee , Jaegul Choo

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by…

Machine Learning · Computer Science 2018-09-05 Farshid Rayhan , Sajid Ahmed , Asif Mahbub , Md. Rafsan Jani , Swakkhar Shatabda , Dewan Md. Farid

FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations

Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Ioannis Sarridis , Christos Koutlis , Symeon Papadopoulos , Christos Diou

Unsupervised Clustering Approaches for Autism Screening: Achieving 95.31% Accuracy with a Gaussian Mixture Model

Autism spectrum disorder (ASD) remains a challenging condition to diagnose effectively and promptly, despite global efforts in public health, clinical screening, and scientific research. Traditional diagnostic methods, primarily reliant on…

Computers and Society · Computer Science 2025-03-11 Nora Fink

A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and…

Machine Learning · Computer Science 2017-05-17 Xi Peng , Huajin Tang , Lei Zhang , Zhang Yi , Shijie Xiao

Sparse-Dense Subspace Clustering

Subspace clustering refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Current subspace clustering approaches are usually based on a two-stage framework. In the first stage, an affinity…

Machine Learning · Computer Science 2019-10-22 Shuai Yang , Wenqi Zhu , Yuesheng Zhu

Revisiting data augmentation for subspace clustering

Subspace clustering is the classical problem of clustering a collection of data samples that approximately lie around several low-dimensional subspaces. The current state-of-the-art approaches for this problem are based on the…

Machine Learning · Computer Science 2023-01-26 Maryam Abdolali , Nicolas Gillis