Related papers: Binary Paragraph Vectors

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level,…

Computation and Language · Computer Science 2019-01-08 Karol Grzegorczyk

Document Embedding with Paragraph Vectors

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be…

Computation and Language · Computer Science 2015-07-30 Andrew M. Dai , Christopher Olah , Quoc V. Le

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014)…

Computation and Language · Computer Science 2017-12-11 Geng Ji , Robert Bamler , Erik B. Sudderth , Stephan Mandt

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features…

Computation and Language · Computer Science 2014-05-26 Quoc V. Le , Tomas Mikolov

Bernoulli Embeddings for Graphs

Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for…

Machine Learning · Computer Science 2018-03-28 Vinith Misra , Sumit Bhatia

Search Efficient Binary Network Embedding

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily…

Social and Information Networks · Computer Science 2023-01-02 Daokun Zhang , Jie Yin , Xingquan Zhu , Chengqi Zhang

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture…

Computation and Language · Computer Science 2016-04-26 Bofang Li , Tao Liu , Xiaoyong Du , Deyuan Zhang , Zhe Zhao

Hypencoder: Hypernetworks for Information Retrieval

Existing information retrieval systems are largely constrained by their reliance on vector inner products to assess query-document relevance, which naturally limits the expressiveness of the relevance score they can produce. We propose a…

Information Retrieval · Computer Science 2025-05-02 Julian Killingback , Hansi Zeng , Hamed Zamani

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a…

Computation and Language · Computer Science 2016-06-29 Nemanja Djuric , Hao Wu , Vladan Radosavljevic , Mihajlo Grbovic , Narayan Bhamidipati

Encoding Binary Concepts in the Latent Space of Generative Models for Enhancing Data Representation

Binary concepts are empirically used by humans to generalize efficiently. And they are based on Bernoulli distribution which is the building block of information. These concepts span both low-level and high-level features such as "large vs…

Machine Learning · Computer Science 2023-03-23 Zizhao Hu , Mohammad Rostami

code2vec: Learning Distributed Representations of Code

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict…

Machine Learning · Computer Science 2018-10-31 Uri Alon , Meital Zilberstein , Omer Levy , Eran Yahav

Multivariate Representation Learning for Information Retrieval

Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot…

Information Retrieval · Computer Science 2023-05-01 Hamed Zamani , Michael Bendersky

Binarized Graph Neural Network

Recently, there have been some breakthroughs in graph analysis by applying the graph neural networks (GNNs) following a neighborhood aggregation scheme, which demonstrate outstanding performance in many tasks. However, we observe that the…

Machine Learning · Computer Science 2021-04-13 Hanchen Wang , Defu Lian , Ying Zhang , Lu Qin , Xiangjian He , Yiguang Lin , Xuemin Lin

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the…

Computation and Language · Computer Science 2014-06-17 Misha Denil , Alban Demiraj , Nal Kalchbrenner , Phil Blunsom , Nando de Freitas

Near-lossless Binarization of Word Embeddings

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Learning Generic Sentence Representations Using Convolutional Neural Networks

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a…

Computation and Language · Computer Science 2017-07-28 Zhe Gan , Yunchen Pu , Ricardo Henao , Chunyuan Li , Xiaodong He , Lawrence Carin

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued,…

Computation and Language · Computer Science 2019-06-21 Dinghan Shen , Pengyu Cheng , Dhanasekar Sundararaman , Xinyuan Zhang , Qian Yang , Meng Tang , Asli Celikyilmaz , Lawrence Carin

Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning

Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…

Machine Learning · Computer Science 2019-03-12 Alexandre Quemy

Fast Metric Learning For Deep Neural Networks

Similarity metrics are a core component of many information retrieval and machine learning systems. In this work we propose a method capable of learning a similarity metric from data equipped with a binary relation. By considering only the…

Machine Learning · Computer Science 2016-04-06 Henry Gouk , Bernhard Pfahringer , Michael Cree

Foundations of Vector Retrieval

Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities. That happens regardless of whether data is represented by hand-crafted features or learnt embeddings. Collect a large…

Data Structures and Algorithms · Computer Science 2024-04-02 Sebastian Bruch