English
Related papers

Related papers: Gram2Vec: An Interpretable Document Vectorizer

200 papers

Analyzing the writing styles of authors and articles is a key to supporting various literary analyses such as author attribution and genre detection. Over the years, rich sets of features that include stylometry, bag-of-words, n-grams have…

Information Retrieval · Computer Science 2023-10-27 Nafis Irtiza Tripto , Mohammed Eunus Ali

Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of…

Computation and Language · Computer Science 2019-11-12 Carl Allen , Ivana Balažević , Timothy Hospedales

We present an efficient document representation learning framework, Document Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average of word embeddings. It ensures a representation generated as such…

Computation and Language · Computer Science 2017-07-11 Minmin Chen

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of…

Machine Learning · Computer Science 2020-03-31 Martin Grohe

To be able to interact better with humans, it is crucial for machines to understand sound - a primary modality of human perception. Previous works have used sound to learn embeddings for improved generic textual similarity assessment. In…

Computation and Language · Computer Science 2017-08-30 Ashwin K Vijayakumar , Ramakrishna Vedantam , Devi Parikh

This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word…

Computation and Language · Computer Science 2024-06-14 Dan Sun , Yaxin Liang , Yining Yang , Yuhan Ma , Qishi Zhan , Erdi Gao

In comparison with document summarization on the articles from social media and newswire, argumentative zoning (AZ) is an important task in scientific paper analysis. Traditional methodology to carry on this task relies on feature…

Computation and Language · Computer Science 2017-03-30 Haixia Liu

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected…

Computation and Language · Computer Science 2016-07-27 Soroush Vosoughi , Prashanth Vijayaraghavan , Deb Roy

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and…

Computation and Language · Computer Science 2017-07-18 Sheng Chen , Akshay Soni , Aasish Pappu , Yashar Mehdad

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This…

Computation and Language · Computer Science 2016-12-19 Jey Han Lau , Timothy Baldwin

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we…

Cryptography and Security · Computer Science 2021-03-11 Aniket Chandak , Wendy Lee , Mark Stamp

Word embedding is designed to represent the semantic meaning of a word with low dimensional vectors. The state-of-the-art methods of learning word embeddings (word2vec and GloVe) only use the word co-occurrence information. The learned…

Computation and Language · Computer Science 2018-09-11 Ruixuan Luo

Authorship Verification (AV) is a key area of research in digital text forensics, which addresses the fundamental question of whether two texts were written by the same person. Numerous computational approaches have been proposed over the…

Computation and Language · Computer Science 2026-04-16 Andrea Nini , Oren Halvani , Lukas Graner , Sophie Titze , Valerio Gherardi , Shunichi Ishihara

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because…

Computation and Language · Computer Science 2020-07-09 Tom Kocmi , Ondřej Bojar

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for…

Computers and Society · Computer Science 2024-07-19 Yuma Miyazaki , Valdemar Švábenský , Yuta Taniguchi , Fumiya Okubo , Tsubasa Minematsu , Atsushi Shimada

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a…

Computation and Language · Computer Science 2018-05-16 Mikhail Khodak , Nikunj Saunshi , Yingyu Liang , Tengyu Ma , Brandon Stewart , Sanjeev Arora

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations…

Computation and Language · Computer Science 2020-03-27 Xiaodong Wu , Weizhe Lin , Zhilin Wang , Elena Rastorgueva

Word embedding or vector representation of word holds syntactical and semantic characteristics of a word which can be an informative feature for any machine learning-based models of natural language processing. There are several deep…

Computation and Language · Computer Science 2021-05-05 Rifat Rahman
‹ Prev 1 2 3 10 Next ›