Related papers: Isotropy, Clusters, and Classifiers

On Isotropy Calibration of Transformers

Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone. Meanwhile, static word representations…

Computation and Language · Computer Science 2021-09-29 Yue Ding , Karolis Martinkus , Damian Pascual , Simon Clematide , Roger Wattenhofer

Metrics for quantifying isotropy in high dimensional unsupervised clustering tasks in a materials context

Clustering is a common task in machine learning, but clusters of unlabelled data can be hard to quantify. The application of clustering algorithms in chemistry is often dependant on material representation. Ascertaining the effects of…

Machine Learning · Computer Science 2023-05-29 Samantha Durdy , Michael W. Gaultois , Vladimir Gusev , Danushka Bollegala , Matthew J. Rosseinsky

A Critique of Self-Expressive Deep Subspace Clustering

Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing…

Machine Learning · Computer Science 2021-03-23 Benjamin D. Haeffele , Chong You , René Vidal

A Classification Theorem on Non-compact Embeddings between Besov Spaces

We analyze the embedding properties between Besov spaces, defined on the total space $\mathbb R^n$ and on bounded domains. We give a complete classification on whether or not these embedding maps satisfy certain weak compactness…

Functional Analysis · Mathematics 2025-09-26 Chian Yeong Chuah , Jan Lang , Liding Yao

IsoScore: Measuring the Uniformity of Embedding Space Utilization

The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Several studies have suggested that contextualized word embedding models do not isotropically…

Computation and Language · Computer Science 2023-02-23 William Rudman , Nate Gillman , Taylor Rayne , Carsten Eickhoff

Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic…

Machine Learning · Statistics 2024-03-25 Zeya Wang , Chenglong Ye

Tame Embeddings, Volume Growth, and Complexity of Moduli Spaces

Quantum gravity is expected to impose constraints on the moduli spaces of massless fields that can arise in effective quantum field theories. A recent proposal asserts that the asymptotic volume growth of these spaces is severely…

High Energy Physics - Theory · Physics 2025-04-18 Thomas W. Grimm , David Prieto , Mick van Vliet

A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space

The representation degeneration problem in Contextual Word Representations (CWRs) hurts the expressiveness of the embedding space by forming an anisotropic cone where even unrelated words have excessively positive correlations. Existing…

Computation and Language · Computer Science 2021-06-03 Sara Rajaee , Mohammad Taher Pilehvar

Are Classes Clusters?

Sentence embedding models aim to provide general purpose embeddings for sentences. Most of the models studied in this paper claim to perform well on STS tasks - but they do not report on their suitability for clustering. This paper looks at…

Computation and Language · Computer Science 2021-04-19 Kees Varekamp

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Scalable Deep $k$-Subspace Clustering

Subspace clustering algorithms are notorious for their scalability issues because building and processing large affinity matrices are demanding. In this paper, we introduce a method that simultaneously learns an embedding space along…

Computer Vision and Pattern Recognition · Computer Science 2018-11-06 Tong Zhang , Pan Ji , Mehrtash Harandi , Richard Hartley , Ian Reid

Learning for Multi-Type Subspace Clustering

Subspace clustering has been extensively studied from the hypothesis-and-test, algebraic, and spectral clustering based perspectives. Most assume that only a single type/class of subspace is present. Generalizations to multiple types are…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Xun Xu , Loong-Fah Cheong , Zhuwen Li

On the Dimensionality of Embeddings for Sparse Features and Data

In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space. We show that when we measure dimensionality in terms of information entropy then the embedding of sparse…

Machine Learning · Computer Science 2019-01-09 Maxim Naumov

Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Prompt-based text embedding models, which generate task-specific embeddings upon receiving tailored prompts, have recently demonstrated remarkable performance. However, their resulting embeddings often have thousands of dimensions, leading…

Computation and Language · Computer Science 2025-06-03 Hayato Tsukagoshi , Ryohei Sasano

Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures

Many cluster similarity indices are used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the…

Discrete Mathematics · Computer Science 2021-08-27 Martijn Gösgens , Alexey Tikhonov , Liudmila Prokhorenkova

Transferable Deep Metric Learning for Clustering

Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset…

Machine Learning · Computer Science 2023-02-14 Simo Alami. C , Rim Kaddah , Jesse Read

The isomorphism problem for classes of computable fields

Theories of classification distinguish classes with some good structure theorem from those for which none is possible. Some classes (dense linear orders, for instance) are non-classifiable in general, but are classifiable when we consider…

Logic · Mathematics 2007-05-23 Wesley Calvert

Cluster Purge Loss: Structuring Transformer Embeddings for Equivalent Mutants Detection

Recent pre-trained transformer models achieve superior performance in various code processing objectives. However, although effective at optimizing decision boundaries, common approaches for fine-tuning them for downstream classification…

Machine Learning · Computer Science 2025-07-29 Adelaide Danilov , Aria Nourbakhsh , Christoph Schommer

Isotropy, homogeneity and dipole saturation

A distribution of points that satisfies the property of local isotropy is not necessarily homogeneous: homogeneity is implied by the condition of local isotropy together with the assumption of analyticity or regularity. Here we show that…

Astrophysics · Physics 2017-04-26 Francesco Sylos Labini

Embeddings for anisotropic Besov spaces

We prove embedding theorems for fully anisotropic Besov spaces. More concrete, inequalities between modulus of continuity in different metrics and of Sobolev type are obtained. Our goal is to get sharp estimates for some anisotropic cases…

Functional Analysis · Mathematics 2007-05-23 F. J. Perez Lazaro