Related papers: Soft Uncoupling of Markov Chains for Permeable Lan…

Analyse spectrale des textes: d\'etection automatique des fronti\`eres de langue et de discours

We propose a theoretical framework within which information on the vocabulary of a given corpus can be inferred on the basis of statistical information gathered on that corpus. Inferences can be made on the categories of the words in the…

Computation and Language · Computer Science 2008-10-08 Pascal Vaillant , Richard Nock , Claudia Henry

Computing Word Classes Using Spectral Clustering

Clustering a lexicon of words is a well-studied problem in natural language processing (NLP). Word clusters are used to deal with sparse data in statistical language processing, as well as features for solving various NLP tasks (text…

Computation and Language · Computer Science 2018-08-17 Effi Levi , Saggy Herman , Ari Rappoport

Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination

We introduce two different approaches for clustering semantically similar words. We accommodate ambiguity by allowing a word to belong to several clusters. Both methods use a graph-theoretic representation of words and their paradigmatic…

Other Condensed Matter · Physics 2009-09-29 Beate Dorow , Dominic Widdows , Katarina Ling , Jean-Pierre Eckmann , Danilo Sergi , Elisha Moses

Rough Sets for Explainability of Spectral Graph Clustering

Graph Spectral Clustering methods (GSC) allow representing clusters of diverse shapes, densities, etc. However, the results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to…

Machine Learning · Computer Science 2026-03-17 Bartłomiej Starosta , Sławomir T. Wierzchoń , Piotr Borkowski , Dariusz Czerski , Marcin Sydow , Eryk Laskowski , Mieczysław A. Kłopotek

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…

Neural and Evolutionary Computing · Computer Science 2015-08-19 John R. Hershey , Zhuo Chen , Jonathan Le Roux , Shinji Watanabe

Micro-Clustering: Finding Small Clusters in Large Diversity

We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at…

Data Structures and Algorithms · Computer Science 2016-06-07 Takeaki Uno , Hiroki Maegawa , Takanobu Nakahara , Yukinobu Hamuro , Ryo Yoshinaka , Makoto Tatsuta

Spectral Probing

Linguistic information is encoded at varying timescales (subwords, phrases, etc.) and communicative levels, such as syntax and semantics. Contextualized embeddings have analogously been found to capture these phenomena at distinctive layers…

Computation and Language · Computer Science 2022-10-24 Max Müller-Eberstein , Rob van der Goot , Barbara Plank

Soft clustering analysis of galaxy morphologies: A worked example with SDSS

Context: The huge and still rapidly growing amount of galaxies in modern sky surveys raises the need of an automated and objective classification method. Unsupervised learning algorithms are of particular interest, since they discover…

Cosmology and Nongalactic Astrophysics · Physics 2015-05-18 Rene Andrae , Peter Melchior , Matthias Bartelmann

Delving into Spectral Clustering with Vision-Language Representations

Spectral clustering is known as a powerful technique in unsupervised data analysis. The vast majority of approaches to spectral clustering are driven by a single modality, leaving the rich information in multi-modal representations…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Bo Peng , Yuanwei Hu , Bo Liu , Ling Chen , Jie Lu , Zhen Fang

Interpretable Fair Clustering

Fair clustering has gained increasing attention in recent years, especially in applications involving socially sensitive attributes. However, existing fair clustering methods often lack interpretability, limiting their applicability in…

Machine Learning · Computer Science 2025-11-27 Mudi Jiang , Jiahui Zhou , Xinying Liu , Zengyou He , Zhikui Chen

Controlling Complexity in Part-of-Speech Induction

We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and…

Computation and Language · Computer Science 2014-01-24 João V. Graça , Kuzman Ganchev , Luisa Coheur , Fernando Pereira , Ben Taskar

Dictionary learning for fast classification based on soft-thresholding

Classifiers based on sparse representations have recently been shown to provide excellent results in many visual recognition and classification tasks. However, the high cost of computing sparse representations at test time is a major…

Computer Vision and Pattern Recognition · Computer Science 2014-10-03 Alhussein Fawzi , Mike Davies , Pascal Frossard

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Simon Malan , Benjamin van Niekerk , Herman Kamper

Identifiability for Blind Source Separation of Multiple Finite Alphabet Linear Mixtures

We give under weak assumptions a complete combinatorial characterization of identifiability for linear mixtures of finite alphabet sources, with unknown mixing weights and unknown source signals, but known alphabet. This is based on a…

Methodology · Statistics 2017-09-01 Merle Behr , Axel Munk

On Constrained Spectral Clustering and Its Applications

Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode…

Machine Learning · Computer Science 2012-09-24 Xiang Wang , Buyue Qian , Ian Davidson

Spectral Clustering with Smooth Tiny Clusters

Spectral clustering is one of the most prominent clustering approaches. The distance-based similarity is the most widely used method for spectral clustering. However, people have already noticed that this is not suitable for multi-scale…

Machine Learning · Computer Science 2020-09-11 Hengrui Wang , Yubo Zhang , Mingzhi Chen , Tong Yang

Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective

Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially…

Artificial Intelligence · Computer Science 2025-06-09 Emmanuel Anaya Gonzalez , Sairam Vaidya , Kanghee Park , Ruyi Ji , Taylor Berg-Kirkpatrick , Loris D'Antoni

Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics

More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual…

Computation and Language · Computer Science 2020-03-17 Tianyi Ni

Sublanguage Terms: Dictionaries, Usage, and Automatic Classification

The use of terms from natural and social scientific titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Different notions of sublanguage distinctiveness are explored. Objective methods…

cmp-lg · Computer Science 2008-02-03 Robert M. Losee , Stephanie W. Haas

A Masked Segmental Language Model for Unsupervised Natural Language Segmentation

Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech…

Computation and Language · Computer Science 2021-09-07 C. M. Downey , Fei Xia , Gina-Anne Levow , Shane Steinert-Threlkeld