Related papers: Are Classes Clusters?

Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences

Sentence embedding methods offer a powerful approach for working with short textual constructs or sequences of words. By representing sentences as dense numerical vectors, many natural language processing (NLP) applications have improved…

Computation and Language · Computer Science 2021-10-05 Yuan An , Alexander Kalinowski , Jane Greenberg

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need…

Computation and Language · Computer Science 2022-04-22 Zihan Zhang , Meng Fang , Ling Chen , Mohammad-Reza Namazi-Rad

Universal Sentence Encoder

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the…

Computation and Language · Computer Science 2018-04-13 Daniel Cer , Yinfei Yang , Sheng-yi Kong , Nan Hua , Nicole Limtiaco , Rhomni St. John , Noah Constant , Mario Guajardo-Cespedes , Steve Yuan , Chris Tar , Yun-Hsuan Sung , Brian Strope , Ray Kurzweil

Improving Language Models by Clustering Training Sentences

Many of the kinds of language model used in speech understanding suffer from imperfect modeling of intra-sentential contextual influences. I argue that this problem can be addressed by clustering the sentences in a training corpus…

cmp-lg · Computer Science 2008-02-03 David Carter

DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

Sentence embeddings are an important component of many natural language processing (NLP) systems. Like word embeddings, sentence embeddings are typically learned on large text corpora and then transferred to various downstream tasks, such…

Computation and Language · Computer Science 2021-05-28 John Giorgi , Osvald Nitski , Bo Wang , Gary Bader

Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way…

Computation and Language · Computer Science 2020-10-08 Suzanna Sia , Ayush Dalmia , Sabrina J. Mielke

Revisiting Word Embeddings in the LLM Era

Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models…

Computation and Language · Computer Science 2025-03-04 Yash Mahajan , Matthew Freestone , Sathyanarayanan Aakur , Santu Karmaker

Revisiting Word Embeddings in the LLM Era

Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models…

Computation and Language · Computer Science 2025-03-04 Yash Mahajan , Matthew Freestone , Naman Bansal , Sathyanarayanan Aakur , Shubhra Kanti Karmaker Santu

Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

Scientific articles are long text documents organized into sections, each describing aspects of the research. Analyzing scientific production has become progressively challenging due to the increase in the number of available articles.…

Computation and Language · Computer Science 2024-04-02 Gustavo Bartz Guedes , Ana Estela Antunes da Silva

Empirical Evaluation of Embedding Models in the Context of Text Classification in Document Review in Construction Delay Disputes

Text embeddings are numerical representations of text data, where words, phrases, or entire documents are converted into vectors of real numbers. These embeddings capture semantic meanings and relationships between text elements in a…

Information Retrieval · Computer Science 2025-01-20 Fusheng Wei , Robert Neary , Han Qin , Qiang Mao , Jianping Zhang

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences.…

Computation and Language · Computer Science 2017-09-07 Miriam Cha , Youngjune Gwon , H. T. Kung

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel…

Computation and Language · Computer Science 2021-12-30 Haw-Shiuan Chang , Amol Agrawal , Andrew McCallum

Finding Meaning in Embeddings: Concept Separation Curves

Sentence embedding techniques aim to encode key concepts of a sentence's meaning in a vector space. However, the majority of evaluation approaches for sentence embedding quality rely on the use of additional classifiers or downstream tasks.…

Computation and Language · Computer Science 2026-04-24 Paul Keuren , Marc Ponsen , Robert Ayoub Bagheri

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

We provide the first exploration of sentence embeddings from text-to-text transformers (T5). Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as…

Computation and Language · Computer Science 2021-12-15 Jianmo Ni , Gustavo Hernández Ábrego , Noah Constant , Ji Ma , Keith B. Hall , Daniel Cer , Yinfei Yang

Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals

There have been many successful applications of sentence embedding methods. However, it has not been well understood what properties are captured in the resulting sentence embeddings depending on the supervision signals. In this paper, we…

Computation and Language · Computer Science 2022-06-13 Hayato Tsukagoshi , Ryohei Sasano , Koichi Takeda

Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering

Discovering the logical sequence of events is one of the cornerstones in Natural Language Understanding. One approach to learn the sequence of events is to study the order of sentences in a coherent text. Sentence ordering can be applied in…

Computation and Language · Computer Science 2021-08-26 Melika Golestani , Seyedeh Zahra Razavi , Zeinab Borhanifard , Farnaz Tahmasebian , Hesham Faili

Word Embeddings and Their Use In Sentence Classification Tasks

This paper have two parts. In the first part we discuss word embeddings. We discuss the need for them, some of the methods to create them, and some of their interesting properties. We also compare them to image embeddings and see how word…

Machine Learning · Computer Science 2016-10-27 Amit Mandelbaum , Adi Shalev

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to…

Computation and Language · Computer Science 2018-12-06 Ying Shen , Qiang Zhang , Jin Zhang , Jiyue Huang , Yuming Lu , Kai Lei

Composition of Sentence Embeddings:Lessons from Statistical Relational Learning

Various NLP problems -- such as the prediction of sentence similarity, entailment, and discourse relations -- are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model…

Computation and Language · Computer Science 2019-04-05 Damien Sileo , Tim Van-De-Cruys , Camille Pradel , Philippe Muller

Classification and Clustering of Arguments with Contextualized Word Embeddings

We experiment with two recent contextualized word embedding methods (ELMo and BERT) in the context of open-domain argument search. For the first time, we show how to leverage the power of contextualized word embeddings to classify and…

Computation and Language · Computer Science 2019-06-25 Nils Reimers , Benjamin Schiller , Tilman Beck , Johannes Daxenberger , Christian Stab , Iryna Gurevych