Related papers: Matrix Factorization using Window Sampling and Neg…

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory

In this paper we take a state-of-the-art model for distributed word representation that explicitly factorizes the positive pointwise mutual information (PPMI) matrix using window sampling and negative sampling and address two of its…

Computation and Language · Computer Science 2016-06-07 Alexandre Salle , Marco Idiart , Aline Villavicencio

RankMat : Matrix Factorization with Calibrated Distributed Embedding and Fairness Enhancement

Matrix Factorization is a widely adopted technique in the field of recommender system. Matrix Factorization techniques range from SVD, LDA, pLSA, SVD++, MatRec, Zipf Matrix Factorization and Item2Vec. In recent years, distributed word…

Information Retrieval · Computer Science 2022-04-28 Hao Wang

Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation

Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success…

Information Retrieval · Computer Science 2017-03-21 Han Tian , Hankz Hankui Zhuo

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling

Word2Vec is the most popular model for word representation and has been widely investigated in literature. However, its noise distribution for negative sampling is decided by empirical trials and the optimality has always been ignored. We…

Computation and Language · Computer Science 2019-10-22 Wenxiang Jiao , Irwin King , Michael R. Lyu

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using…

Computation and Language · Computer Science 2015-08-18 Shaohua Li , Jun Zhu , Chunyan Miao

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

Simulated Annealing with Levy Distribution for Fast Matrix Factorization-Based Collaborative Filtering

Matrix factorization is one of the best approaches for collaborative filtering, because of its high accuracy in presenting users and items latent factors. The main disadvantages of matrix factorization are its complexity, and being very…

Machine Learning · Computer Science 2017-08-10 Mostafa A. Shehata , Mohammad Nassef , Amr A. Badr

Word Embeddings via Tensor Factorization

Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order…

Machine Learning · Statistics 2017-09-19 Eric Bailey , Shuchin Aeron

A New Geometric Approach to Latent Topic Modeling and Discovery

A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering…

Machine Learning · Statistics 2016-11-17 Weicong Ding , Mohammad H. Rohban , Prakash Ishwar , Venkatesh Saligrama

Learning Word Embedding with Better Distance Weighting and Window Size Scheduling

Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on…

Computation and Language · Computer Science 2024-07-30 Chaohao Yang , Chris Ding

Machine Learning Sentiment Prediction based on Hybrid Document Representation

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a…

Computation and Language · Computer Science 2015-12-01 Panagiotis Stalidis , Maria Giatsoglou , Konstantinos Diamantaras , George Sarigiannidis , Konstantinos Ch. Chatzisavvas

Swivel: Improving Embeddings by Noticing What's Missing

We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating low-dimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information…

Computation and Language · Computer Science 2016-02-09 Noam Shazeer , Ryan Doherty , Colin Evans , Chris Waterson

Tile2Vec: Unsupervised representation learning for spatially distributed data

Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec,…

Computer Vision and Pattern Recognition · Computer Science 2018-05-31 Neal Jean , Sherrie Wang , Anshul Samar , George Azzari , David Lobell , Stefano Ermon

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, much attention has been paid to Automatic Document Summarization. The key…

Computation and Language · Computer Science 2019-02-05 Kamal Al-Sabahi , Zhang Zuping , Yang Kang

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model…

Computation and Language · Computer Science 2018-02-20 Abhik Jana , Pawan Goyal

Learning nonnegative matrix factorizations from compressed data

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or…

Optimization and Control · Mathematics 2026-02-17 Abraar Chaudhry , Elizaveta Rebrova

Fixed versus Dynamic Co-Occurrence Windows in TextRank Term Weights for Information Retrieval

TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a…

Information Retrieval · Computer Science 2017-04-07 Wei Lu , Qikai Cheng , Christina Lioma

Global Vectors for Node Representations

Most network embedding algorithms consist in measuring co-occurrences of nodes via random walks then learning the embeddings using Skip-Gram with Negative Sampling. While it has proven to be a relevant choice, there are alternatives, such…

Computation and Language · Computer Science 2019-03-01 Robin Brochier , Adrien Guille , Julien Velcin

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

Vector-based word representations help countless Natural Language Processing (NLP) tasks capture the language's semantic and syntactic regularities. In this paper, we present the characteristics of existing word embedding approaches and…

Computation and Language · Computer Science 2024-03-05 Obaidullah Zaland , Muhammad Abulaish , Mohd. Fazil

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation…

Computation and Language · Computer Science 2016-10-06 Christophe Servan , Alexandre Berard , Zied Elloumi , Hervé Blanchon , Laurent Besacier