English
Related papers

Related papers: Grouping Words Using Statistical Context

200 papers

Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word…

cmp-lg · Computer Science 2008-02-03 Philip Resnik

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

Clustering a lexicon of words is a well-studied problem in natural language processing (NLP). Word clusters are used to deal with sparse data in statistical language processing, as well as features for solving various NLP tasks (text…

Computation and Language · Computer Science 2018-08-17 Effi Levi , Saggy Herman , Ari Rappoport

We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the…

cmp-lg · Computer Science 2008-02-03 Fernando Pereira , Naftali Tishby , Lillian Lee

Distributional text clustering delivers semantically informative representations and captures the relevance between each word and semantic clustering centroids. We extend the neural text clustering approach to text classification tasks by…

Computation and Language · Computer Science 2020-11-25 Yekun Chai , Haidong Zhang , Shuo Jin

Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible parts-of-speech for unknown words. Three…

cmp-lg · Computer Science 2008-02-03 Andrei Mikheev

We compare the performance of different clustering algorithms applied to the task of unsupervised text categorization. We consider agglomerative clustering algorithms, principal direction divisive partitioning and (for the first time)…

Disordered Systems and Neural Networks · Physics 2007-05-23 D. Volk , M. G. Stepanov

To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often…

Artificial Intelligence · Computer Science 2025-01-14 Ruiqi Zhong , Heng Wang , Dan Klein , Jacob Steinhardt

Any approach aimed at pasteurizing and quantifying a particular phenomenon must include the use of robust statistical methodologies for data analysis. With this in mind, the purpose of this study is to present statistical approaches that…

Computation and Language · Computer Science 2023-06-29 Anagh Chattopadhyay , Soumya Sankar Ghosh , Samir Karmakar

The advent of online social networks has led to the development of an abundant literature on the study of online social groups and their relationship to individuals' personalities as revealed by their textual productions. Social structures…

Social and Information Networks · Computer Science 2024-06-26 Ixandra Achitouv , David Chavalarias , Bruno Gaume

The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels,…

Artificial Intelligence · Computer Science 2024-10-07 Walter Laurito , Sharan Maiya , Grégoire Dhimoïla , Owen , Yeung , Kaarel Hänni

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model…

Computation and Language · Computer Science 2020-02-26 Ryan Cotterell , Christo Kirov , Sabrina J. Mielke , Jason Eisner

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…

Computation and Language · Computer Science 2013-12-31 Tamal Chowdhury , Rabindra Rakshit , Arko Banerjee

Several methods have been explored for automating parts of Systematic Mapping (SM) and Systematic Review (SR) methodologies. Challenges typically evolve around the gaps in semantic understanding of text, as well as lack of domain and…

Computation and Language · Computer Science 2021-02-10 Xiajing Li , Marios Daoutis

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if…

Computation and Language · Computer Science 2016-01-22 Halid Ziya Yerebakan , Fitsum Reda , Yiqiang Zhan , Yoshihisa Shinagawa

We introduce two different approaches for clustering semantically similar words. We accommodate ambiguity by allowing a word to belong to several clusters. Both methods use a graph-theoretic representation of words and their paradigmatic…

Other Condensed Matter · Physics 2009-09-29 Beate Dorow , Dominic Widdows , Katarina Ling , Jean-Pierre Eckmann , Danilo Sergi , Elisha Moses

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

When looking at the structure of natural language, "phrases" and "words" are central notions. We consider the problem of identifying such "meaningful subparts" of language of any length and underlying composition principles in a completely…

Computation and Language · Computer Science 2016-02-19 Stefan Gerdjikov , Klaus U. Schulz

We develop and test a novel unsupervised algorithm for word sense induction and disambiguation which uses topological data analysis. Typical approaches to the problem involve clustering, based on simple low level features of distance in…

Computation and Language · Computer Science 2022-03-02 Michael Rawson , Samuel Dooley , Mithun Bharadwaj , Rishabh Choudhary

Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the…

Computation and Language · Computer Science 2024-02-05 Xianghe Ma , Michael Strube , Wei Zhao
‹ Prev 1 2 3 10 Next ›