Related papers: Gram2Vec: An Interpretable Document Vectorizer

The Word2vec Graph Model for Author Attribution and Genre Detection in Literary Analysis

Analyzing the writing styles of authors and articles is a key to supporting various literary analyses such as author attribution and genre detection. Over the years, rich sets of features that include stylometry, bag-of-words, n-grams have…

Information Retrieval · Computer Science 2023-10-27 Nafis Irtiza Tripto , Mohammed Eunus Ali

What the Vec? Towards Probabilistically Grounded Embeddings

Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of…

Computation and Language · Computer Science 2019-11-12 Carl Allen , Ivana Balažević , Timothy Hospedales

Efficient Vector Representation for Documents through Corruption

We present an efficient document representation learning framework, Document Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average of word embeddings. It ensures a representation generated as such…

Computation and Language · Computer Science 2017-07-11 Minmin Chen

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of…

Machine Learning · Computer Science 2020-03-31 Martin Grohe

Sound-Word2Vec: Learning Word Representations Grounded in Sounds

To be able to interact better with humans, it is crucial for machines to understand sound - a primary modality of human perception. Previous works have used sound to learn embeddings for improved generic textual similarity assessment. In…

Computation and Language · Computer Science 2017-08-30 Ashwin K Vijayakumar , Ramakrishna Vedantam , Devi Parikh

Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word…

Computation and Language · Computer Science 2024-06-14 Dan Sun , Yaxin Liang , Yining Yang , Yuhan Ma , Qishi Zhan , Erdi Gao

Automatic Argumentative-Zoning Using Word2vec

In comparison with document summarization on the articles from social media and newswire, argumentative zoning (AZ) is an important task in scientific paper analysis. Traditional methodology to carry on this task relies on feature…

Computation and Language · Computer Science 2017-03-30 Haixia Liu

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected…

Computation and Language · Computer Science 2016-07-27 Soroush Vosoughi , Prashanth Vijayaraghavan , Deb Roy

DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and…

Computation and Language · Computer Science 2017-07-18 Sheng Chen , Akshay Soni , Aasish Pappu , Yashar Mehdad

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This…

Computation and Language · Computer Science 2016-12-19 Jey Han Lau , Timothy Baldwin

A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we…

Cryptography and Security · Computer Science 2021-03-11 Aniket Chandak , Wendy Lee , Mark Stamp

Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model

Word embedding is designed to represent the semantic meaning of a word with low dimensional vectors. The state-of-the-art methods of learning word embeddings (word2vec and GloVe) only use the word co-occurrence information. The learned…

Computation and Language · Computer Science 2018-09-11 Ruixuan Luo

Grammar as a Behavioral Biometric: Using Cognitively Motivated Grammar Models for Authorship Verification

Authorship Verification (AV) is a key area of research in digital text forensics, which addresses the fundamental question of whether two texts were written by the same person. Numerous computational approaches have been proposed over the…

Computation and Language · Computer Science 2026-04-16 Andrea Nini , Oren Halvani , Lukas Graner , Sophie Titze , Valerio Gherardi , Shunichi Ishihara

SubGram: Extending Skip-gram Word Representation with Substrings

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because…

Computation and Language · Computer Science 2020-07-09 Tom Kocmi , Ondřej Bojar

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for…

Computers and Society · Computer Science 2024-07-19 Yuma Miyazaki , Valdemar Švábenský , Yuta Taniguchi , Fumiya Okubo , Tsubasa Minematsu , Atsushi Shimada

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a…

Computation and Language · Computer Science 2018-05-16 Mikhail Khodak , Nikunj Saunshi , Yingyu Liang , Tengyu Ma , Brandon Stewart , Sanjeev Arora

Towards a Theoretical Understanding of Word and Relation Representation

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Author2Vec: A Framework for Generating User Embedding

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations…

Computation and Language · Computer Science 2020-03-27 Xiaodong Wu , Weizhe Lin , Zhilin Wang , Elena Rastorgueva

Robust and Consistent Estimation of Word Embedding for Bangla Language by fine-tuning Word2Vec Model

Word embedding or vector representation of word holds syntactical and semantic characteristics of a word which can be an informative feature for any machine learning-based models of natural language processing. There are several deep…

Computation and Language · Computer Science 2021-05-05 Rifat Rahman