Related papers: Learning Compressed Sentence Representations for O…

Near-lossless Binarization of Word Embeddings

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Semantic Sentence Embeddings for Paraphrasing and Text Summarization

This paper introduces a sentence to vector encoding framework suitable for advanced natural language processing. Our latent representation is shown to encode sentences with common semantic information with similar vector representations.…

Computation and Language · Computer Science 2018-09-30 Chi Zhang , Shagan Sah , Thang Nguyen , Dheeraj Peri , Alexander Loui , Carl Salvaggio , Raymond Ptucha

Sentence transition matrix: An efficient approach that preserves sentence semantics

Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in…

Computation and Language · Computer Science 2019-01-17 Myeongjun Jang , Pilsung Kang

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and…

Computation and Language · Computer Science 2017-02-10 Yossi Adi , Einat Kermany , Yonatan Belinkov , Ofer Lavi , Yoav Goldberg

Correcting the Common Discourse Bias in Linear Representation of Sentences using Conceptors

Distributed representations of words, better known as word embeddings, have become important building blocks for natural language processing tasks. Numerous studies are devoted to transferring the success of unsupervised word embeddings to…

Computation and Language · Computer Science 2018-11-28 Tianlin Liu , João Sedoc , Lyle Ungar

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by…

Information Retrieval · Computer Science 2019-08-16 Felix Hamann , Nadja Kurz , Adrian Ulges

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding.…

Computation and Language · Computer Science 2023-07-06 Sonal Sannigrahi , Josef van Genabith , Cristina Espana-Bonet

Subspace Representations for Soft Set Operations and Sentence Similarities

In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional…

Computation and Language · Computer Science 2024-04-11 Yoichi Ishibashi , Sho Yokoi , Katsuhito Sudoh , Satoshi Nakamura

Static Word Embeddings for Sentence Semantic Representation

We propose new static word embeddings optimised for sentence semantic representation. We first extract word embeddings from a pre-trained Sentence Transformer, and improve them with sentence-level principal component analysis, followed by…

Computation and Language · Computer Science 2025-10-01 Takashi Wada , Yuki Hirakawa , Ryotaro Shimizu , Takahiro Kawashima , Yuki Saito

Compressing Word Embeddings

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic. However, these vector space representations (created through large-scale…

Computation and Language · Computer Science 2016-05-17 Martin Andrews

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general…

Computation and Language · Computer Science 2018-04-03 Sandeep Subramanian , Adam Trischler , Yoshua Bengio , Christopher J Pal

Learning Generic Sentence Representations Using Convolutional Neural Networks

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a…

Computation and Language · Computer Science 2017-07-28 Zhe Gan , Yunchen Pu , Ricardo Henao , Chunyuan Li , Xiaodong He , Lawrence Carin

Learning Robust, Transferable Sentence Representations for Text Classification

Despite deep recurrent neural networks (RNNs) demonstrate strong performance in text classification, training RNN models are often expensive and requires an extensive collection of annotated data which may not be available. To overcome the…

Computation and Language · Computer Science 2018-10-02 Wasi Uddin Ahmad , Xueying Bai , Nanyun Peng , Kai-Wei Chang

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

Vector-based word representations help countless Natural Language Processing (NLP) tasks capture the language's semantic and syntactic regularities. In this paper, we present the characteristics of existing word embedding approaches and…

Computation and Language · Computer Science 2024-03-05 Obaidullah Zaland , Muhammad Abulaish , Mohd. Fazil

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick

Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining

Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of…

Computation and Language · Computer Science 2024-05-14 Eyal Orbach , Lev Haikin , Nelly David , Avi Faizakof

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level,…

Computation and Language · Computer Science 2019-01-08 Karol Grzegorczyk

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words,…

Computation and Language · Computer Science 2016-07-25 Kuan-Yu Chen , Shih-Hung Liu , Berlin Chen , Hsin-Min Wang , Hsin-Hsi Chen

Empirical Evaluation of Embedding Models in the Context of Text Classification in Document Review in Construction Delay Disputes

Text embeddings are numerical representations of text data, where words, phrases, or entire documents are converted into vectors of real numbers. These embeddings capture semantic meanings and relationships between text elements in a…

Information Retrieval · Computer Science 2025-01-20 Fusheng Wei , Robert Neary , Han Qin , Qiang Mao , Jianping Zhang

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as…

Computation and Language · Computer Science 2017-02-27 Cem Safak Sahin , Rajmonda S. Caceres , Brandon Oselio , William M. Campbell