Related papers: Word2vec Skip-gram Dimensionality Selection via Se…
We describe a method for learning word embeddings with data-dependent dimensionality. Our Stochastic Dimensionality Skip-Gram (SD-SG) and Stochastic Dimensionality Continuous Bag-of-Words (SD-CBOW) are nonparametric analogs of Mikolov et…
To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the…
Skip-gram with negative sampling, a popular variant of Word2vec originally designed and tuned to create word embeddings for Natural Language Processing, has been used to create item embeddings with successful applications in recommendation.…
Word2Vec's Skip Gram model is the current state-of-the-art approach for estimating the distributed representation of words. However, it assumes a single vector per word, which is not well-suited for representing words that have multiple…
Effective model selection is critical in symbolic regression (SR) to identify mathematical expressions that balance accuracy and complexity, and have low expected error on unseen data. Many modern implementations of genetic programming (GP)…
Score based learning (SBL) is a promising approach for learning Bayesian networks in the discrete domain. However, when employing SBL in the continuous domain, one is either forced to move the problem to the discrete domain or use metrics…
We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark shows that SNM $n$-gram LMs…
The skip-gram (SG) model learns word representation by predicting the words surrounding a center word from unstructured text data. However, not all words in the context window contribute to the meaning of the center word. For example, less…
In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion…
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various…
This paper addresses feature subset selection for Support Vector Machines (SVMs) based on the cross-validation criterion. Unlike statistical criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion…
Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem…
Neural likelihood estimation methods for simulation-based inference can suffer from performance degradation when the modeled data is very high-dimensional or lies along a lower-dimensional manifold, which is due to the inability of the…
Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection…
We tackle the problem of penalty selection of regularization on the basis of the minimum description length (MDL) principle. In particular, we consider that the design space of the penalty function is high-dimensional. In this situation,…
The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model.…
Word2Vec is the most popular model for word representation and has been widely investigated in literature. However, its noise distribution for negative sampling is decided by empirical trials and the optimality has always been ignored. We…
Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because…
Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on…
Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this…