Related papers: Binary Paragraph Vectors
In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level,…
Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be…
Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014)…
Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features…
Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for…
Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily…
Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture…
Existing information retrieval systems are largely constrained by their reliance on vector inner products to assess query-document relevance, which naturally limits the expressiveness of the relevance score they can produce. We propose a…
We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a…
Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs…
We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…
Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot…
Recently, there have been some breakthroughs in graph analysis by applying the graph neural networks (GNNs) following a neighborhood aggregation scheme, which demonstrate outstanding performance in many tasks. However, we observe that the…
Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the…
Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…
We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a…
Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued,…
Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…
Similarity metrics are a core component of many information retrieval and machine learning systems. In this work we propose a method capable of learning a similarity metric from data equipped with a binary relation. By considering only the…
Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities. That happens regardless of whether data is represented by hand-crafted features or learnt embeddings. Collect a large…