Related papers: Context Aware Document Embedding

Context Aware Machine Learning

We propose a principle for exploring context in machine learning models. Starting with a simple assumption that each observation may or may not depend on its context, a conditional probability distribution is decomposed into two parts:…

Machine Learning · Computer Science 2019-01-23 Yun Zeng

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

On the Effects of Using word2vec Representations in Neural Networks for Dialogue Act Recognition

Dialogue act recognition is an important component of a large number of natural language processing pipelines. Many research works have been carried out in this area, but relatively few investigate deep neural networks and word embeddings.…

Computation and Language · Computer Science 2020-10-23 Christophe Cerisara , Pavel Kral , Ladislav Lenc

Contextually Propagated Term Weights for Document Representation

Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a…

Information Retrieval · Computer Science 2019-06-04 Casper Hansen , Christian Hansen , Stephen Alstrup , Jakob Grue Simonsen , Christina Lioma

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of…

Information Retrieval · Computer Science 2019-11-28 Zhuyun Dai , Jamie Callan

Context encoders as a simple but powerful extension of word2vec

With a simple architecture and the ability to learn meaningful word embeddings efficiently from texts containing billions of words, word2vec remains one of the most popular neural language models used today. However, as only a single…

Machine Learning · Statistics 2017-06-09 Franziska Horn

Learning Deep Context-Network Architectures for Image Annotation

Context plays an important role in visual pattern recognition as it provides complementary clues for different learning tasks including image classification and annotation. In the particular scenario of kernel learning, the general recipe…

Computer Vision and Pattern Recognition · Computer Science 2018-03-26 Mingyuan Jiu , Hichem Sahbi

HanoiT: Enhancing Context-aware Translation via Selective Context

Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model…

Computation and Language · Computer Science 2023-04-20 Jian Yang , Yuwei Yin , Shuming Ma , Liqun Yang , Hongcheng Guo , Haoyang Huang , Dongdong Zhang , Yutao Zeng , Zhoujun Li , Furu Wei

Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings

We present a novel online algorithm that learns the essence of each dimension in word embeddings by minimizing the within-group distance of contextualized embedding groups. Three state-of-the-art neural-based language models are used,…

Computation and Language · Computer Science 2020-05-26 Xinyi Jiang , Zhengzhe Yang , Jinho D. Choi

Context-Aware Learning for Neural Machine Translation

Interest in larger-context neural machine translation, including document-level and multi-modal translation, has been growing. Multiple works have proposed new network architectures or evaluation schemes, but potentially helpful context is…

Computation and Language · Computer Science 2019-03-13 Sébastien Jean , Kyunghyun Cho

Improving a tf-idf weighted document vector embedding

We examine a number of methods to compute a dense vector embedding for a document in a corpus, given a set of word vectors such as those from word2vec or GloVe. We describe two methods that can improve upon a simple weighted sum, that are…

Computation and Language · Computer Science 2019-02-27 Craig W. Schmidt

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

Adaptive Region Embedding for Text Classification

Deep learning models such as convolutional neural networks and recurrent networks are widely applied in text classification. In spite of their great success, most deep learning models neglect the importance of modeling context information,…

Computation and Language · Computer Science 2019-06-05 Liuyu Xiang , Xiaoming Jin , Lan Yi , Guiguang Ding

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different…

Computation and Language · Computer Science 2020-10-12 Xiaomian Kang , Yang Zhao , Jiajun Zhang , Chengqing Zong

Deep Context-Aware Kernel Networks

Context plays a crucial role in visual recognition as it provides complementary clues for different learning tasks including image classification and annotation. As the performances of these tasks are currently reaching a plateau, any extra…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Mingyuan Jiu , Hichem Sahbi

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of…

Computation and Language · Computer Science 2017-12-06 Willie Boag , Hassan Kané

Corpus-level and Concept-based Explanations for Interpretable Document Classification

Using attention weights to identify information that is important for models' decision-making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for…

Information Retrieval · Computer Science 2021-06-01 Tian Shi , Xuchao Zhang , Ping Wang , Chandan K. Reddy

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014)…

Computation and Language · Computer Science 2017-12-11 Geng Ji , Robert Bamler , Erik B. Sudderth , Stephan Mandt

Word Embeddings for the Construction Domain

We introduce word vectors for the construction domain. Our vectors were obtained by running word2vec on an 11M-word corpus that we created from scratch by leveraging freely-accessible online sources of construction-related text. We first…

Computation and Language · Computer Science 2016-10-31 Antoine J. -P. Tixier , Michalis Vazirgiannis , Matthew R. Hallowell

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This…

Computation and Language · Computer Science 2016-12-19 Jey Han Lau , Timothy Baldwin