Related papers: Spherical Paragraph Model

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual…

Computation and Language · Computer Science 2016-03-01 Ivan Vulić , Marie-Francine Moens

Towards General Natural Language Understanding with Probabilistic Worldbuilding

We introduce the Probabilistic Worldbuilding Model (PWM), a new fully-symbolic Bayesian model of semantic parsing and reasoning, as a first step in a research program toward more domain- and task-general NLU and AI. Humans create internal…

Computation and Language · Computer Science 2021-12-22 Abulhair Saparov , Tom M. Mitchell

Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm

The introduction of embedding techniques has pushed forward significantly the Natural Language Processing field. Many of the proposed solutions have been presented for word-level encoding; anyhow, in the last years, new mechanism to treat…

Computation and Language · Computer Science 2023-04-07 Matteo Muffo , Roberto Tedesco , Licia Sbattella , Vincenzo Scotti

Probabilistic Modelling of Morphologically Rich Languages

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often…

Computation and Language · Computer Science 2015-08-19 Jan A. Botha

A Structured Distributional Model of Sentence Meaning and Processing

Most compositional distributional semantic models represent sentence meaning with a single vector. In this paper, we propose a Structured Distributional Model (SDM) that combines word embeddings with formal semantics and is based on the…

Computation and Language · Computer Science 2019-06-19 Emmanuele Chersoni , Enrico Santus , Ludovica Pannitto , Alessandro Lenci , Philippe Blache , Chu-Ren Huang

Semantic Composition via Probabilistic Model Theory

Semantic composition remains an open problem for vector space models of semantics. In this paper, we explain how the probabilistic graphical model used in the framework of Functional Distributional Semantics can be interpreted as a…

Computation and Language · Computer Science 2017-09-04 Guy Emerson , Ann Copestake

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods model documents through multinomial distribution on bags of words. Although these methods can effectively utilize the word burstiness representation of documents and…

Machine Learning · Computer Science 2018-12-03 Tiehang Duan , Qi Lou , Sargur N. Srihari , Xiaohui Xie

Compositional Morphology for Word Representations and Language Modelling

This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model. Our approach is evaluated in the context of log-bilinear language models, rendered suitably…

Computation and Language · Computer Science 2014-05-19 Jan A. Botha , Phil Blunsom

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and…

Computation and Language · Computer Science 2016-11-21 Siwei Lai

Learning Probabilistic Sentence Representations from Paraphrases

Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences. In this paper we define probabilistic models…

Computation and Language · Computer Science 2020-05-19 Mingda Chen , Kevin Gimpel

PBoS: Probabilistic Bag-of-Subwords for Generalizing Word Embedding

We look into the task of \emph{generalizing} word embeddings: given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words, \emph{without} extra contextual…

Computation and Language · Computer Science 2020-10-22 Zhao Jinman , Shawn Zhong , Xiaomin Zhang , Yingyu Liang

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art…

Computation and Language · Computer Science 2020-06-02 Bin Wang , C. -C. Jay Kuo

Siamese CBOW: Optimizing Word Embeddings for Sentence Representations

We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful…

Computation and Language · Computer Science 2016-06-16 Tom Kenter , Alexey Borisov , Maarten de Rijke

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features…

Computation and Language · Computer Science 2014-05-26 Quoc V. Le , Tomas Mikolov

Text Classification based on Word Subspace with Term-Frequency

Text classification has become indispensable due to the rapid increase of text in digital form. Over the past three decades, efforts have been made to approach this task using various learning algorithms and statistical models based on…

Machine Learning · Statistics 2018-06-11 Erica K. Shimomoto , Lincon S. Souza , Bernardo B. Gatto , Kazuhiro Fukui

Sequential Document Representations and Simplicial Curves

The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present a continuous and…

Information Retrieval · Computer Science 2012-07-02 Guy Lebanon

hauWE: Hausa Words Embedding for Natural Language Processing

Words embedding (distributed word vector representations) have become an essential component of many natural language processing (NLP) tasks such as machine translation, sentiment analysis, word analogy, named entity recognition and word…

Computation and Language · Computer Science 2020-01-08 Idris Abdulmumin , Bashir Shehu Galadanci

Sparse Lifting of Dense Vectors: Unifying Word and Sentence Representations

As the first step in automated natural language processing, representing words and sentences is of central importance and has attracted significant research attention. Different approaches, from the early one-hot and bag-of-words…

Computation and Language · Computer Science 2019-11-06 Wenye Li , Senyue Hao

Attention Word Embedding

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding…

Computation and Language · Computer Science 2020-06-02 Shashank Sonkar , Andrew E. Waters , Richard G. Baraniuk

Comparing Text Representations: A Theory-Driven Approach

Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain…

Computation and Language · Computer Science 2021-09-16 Gregory Yauney , David Mimno