Related papers: Unsupervised Contextualized Document Representatio…

Improving Document Classification with Multi-Sense Embeddings

Efficient representation of text documents is an important building block in many NLP tasks. Research on long text categorization has shown that simple weighted averaging of word vectors for sentence representation often outperforms more…

Computation and Language · Computer Science 2019-11-20 Vivek Gupta , Ankit Saw , Pegah Nokhiz , Harshit Gupta , Partha Talukdar

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text…

Computation and Language · Computer Science 2017-05-15 Dheeraj Mekala , Vivek Gupta , Bhargavi Paranjape , Harish Karnick

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words…

Computation and Language · Computer Science 2019-10-02 Gregor Wiedemann , Steffen Remus , Avi Chawla , Chris Biemann

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding.…

Computation and Language · Computer Science 2023-07-06 Sonal Sannigrahi , Josef van Genabith , Cristina Espana-Bonet

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning…

Computation and Language · Computer Science 2021-02-08 Yan Zhang , Ruidan He , Zuozhu Liu , Kwan Hui Lim , Lidong Bing

An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural…

Computation and Language · Computer Science 2020-06-03 Shi-Yan Weng , Tien-Hong Lo , Berlin Chen

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural…

Computation and Language · Computer Science 2021-03-15 Shauli Ravfogel , Yanai Elazar , Jacob Goldberger , Yoav Goldberg

How Can BERT Help Lexical Semantics Tasks?

Contextualized embeddings such as BERT can serve as strong input representations to NLP tasks, outperforming their static embeddings counterparts such as skip-gram, CBOW and GloVe. However, such embeddings are dynamic, calculated according…

Computation and Language · Computer Science 2020-04-07 Yile Wang , Leyang Cui , Yue Zhang

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel…

Computation and Language · Computer Science 2021-12-30 Haw-Shiuan Chang , Amol Agrawal , Andrew McCallum

Classification and Clustering of Arguments with Contextualized Word Embeddings

We experiment with two recent contextualized word embedding methods (ELMo and BERT) in the context of open-domain argument search. For the first time, we show how to leverage the power of contextualized word embeddings to classify and…

Computation and Language · Computer Science 2019-06-25 Nils Reimers , Benjamin Schiller , Tilman Beck , Johannes Daxenberger , Christian Stab , Iryna Gurevych

Sentence Embeddings using Supervised Contrastive Learning

Sentence embeddings encode sentences in fixed dense vectors and have played an important role in various NLP tasks and systems. Methods for building sentence embeddings include unsupervised learning such as Quick-Thoughts and supervised…

Computation and Language · Computer Science 2021-06-10 Danqi Liao

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pre-trained language models achieve high performance on many downstream tasks, the native derived sentence…

Computation and Language · Computer Science 2021-05-26 Yuanmeng Yan , Rumei Li , Sirui Wang , Fuzheng Zhang , Wei Wu , Weiran Xu

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need…

Computation and Language · Computer Science 2022-04-22 Zihan Zhang , Meng Fang , Ling Chen , Mohammad-Reza Namazi-Rad

Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling

While supervised learning models have shown remarkable performance in various natural language processing (NLP) tasks, their success heavily relies on the availability of large-scale labeled datasets, which can be costly and time-consuming…

Computation and Language · Computer Science 2024-06-04 Wrick Talukdar , Anjanava Biswas

A text autoencoder from transformer for fast encoding language representation

In recent years BERT shows apparent advantages and great potential in natural language processing tasks. However, both training and applying BERT requires intensive time and resources for computing contextual language representations, which…

Computation and Language · Computer Science 2021-11-05 Tan Huang

Correcting the Common Discourse Bias in Linear Representation of Sentences using Conceptors

Distributed representations of words, better known as word embeddings, have become important building blocks for natural language processing tasks. Numerous studies are devoted to transferring the success of unsupervised word embeddings to…

Computation and Language · Computer Science 2018-11-28 Tianlin Liu , João Sedoc , Lyle Ungar

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art…

Computation and Language · Computer Science 2020-06-02 Bin Wang , C. -C. Jay Kuo

Deeper Text Understanding for IR with Contextual Neural Language Modeling

Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations…

Information Retrieval · Computer Science 2019-05-23 Zhuyun Dai , Jamie Callan

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is…

Computation and Language · Computer Science 2017-07-26 Andrei M. Butnaru , Radu Tudor Ionescu

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks…

Computation and Language · Computer Science 2020-02-18 Jinhua Zhu , Yingce Xia , Lijun Wu , Di He , Tao Qin , Wengang Zhou , Houqiang Li , Tie-Yan Liu