Related papers: Word2vec Skip-gram Dimensionality Selection via Se…

Learning the Dimensionality of Word Embeddings

We describe a method for learning word embeddings with data-dependent dimensionality. Our Stochastic Dimensionality Skip-Gram (SD-SG) and Stochastic Dimensionality Continuous Bag-of-Words (SD-CBOW) are nonparametric analogs of Mikolov et…

Machine Learning · Statistics 2017-04-14 Eric Nalisnick , Sachin Ravi

An Analysis on the Learning Rules of the Skip-Gram Model

To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the…

Computation and Language · Computer Science 2020-03-20 Canlin Zhang , Xiuwen Liu , Daniel Bis

Word2Vec applied to Recommendation: Hyperparameters Matter

Skip-gram with negative sampling, a popular variant of Word2vec originally designed and tuned to create word embeddings for Natural Language Processing, has been used to create item embeddings with successful applications in recommendation.…

Information Retrieval · Computer Science 2018-08-30 Hugo Caselles-Dupré , Florian Lesaint , Jimena Royo-Letelier

Distributed representation of multi-sense words: A loss-driven approach

Word2Vec's Skip Gram model is the current state-of-the-art approach for estimating the distributed representation of words. However, it assumes a single vector per word, which is not well-suited for representing words that have multiple…

Computation and Language · Computer Science 2019-04-16 Saurav Manchanda , George Karypis

A Comparative Study of Model Selection Criteria for Symbolic Regression

Effective model selection is critical in symbolic regression (SR) to identify mathematical expressions that balance accuracy and complexity, and have low expected error on unseen data. Many modern implementations of genetic programming (GP)…

Machine Learning · Computer Science 2026-05-13 Ali Soltani , Gabriel Kronberger , Fabricio Olivetti de Franca , Mattia Billa , Alessandro Lucantonio

Renormalized Normalized Maximum Likelihood and Three-Part Code Criteria For Learning Gaussian Networks

Score based learning (SBL) is a promising approach for learning Bayesian networks in the discrete domain. However, when employing SBL in the continuous domain, one is either forced to move the problem to the discrete domain or use metrics…

Machine Learning · Computer Science 2024-10-30 Borzou Alipourfard , Jean X. Gao

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark shows that SNM $n$-gram LMs…

Machine Learning · Computer Science 2015-06-30 Noam Shazeer , Joris Pelemans , Ciprian Chelba

Contextual Skipgram: Training Word Representation Using Context Information

The skip-gram (SG) model learns word representation by predicting the words surrounding a center word from unstructured text data. However, not all words in the context window contribute to the meaning of the center word. For example, less…

Computation and Language · Computer Science 2021-02-18 Dongjae Kim , Jong-Kook Kim

Consistent Bayesian Information Criterion Based on a Mixture Prior for Possibly High-Dimensional Multivariate Linear Regression Models

In the problem of selecting variables in a multivariate linear regression model, we derive new Bayesian information criteria based on a prior mixing a smooth distribution and a delta distribution. Each of them can be interpreted as a fusion…

Statistics Theory · Mathematics 2022-09-29 Haruki Kono , Tatsuya Kubokawa

Bayesian Neural Word Embedding

Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various…

Computation and Language · Computer Science 2017-02-22 Oren Barkan

Cross-validation-based optimal feature selection for linear SVM classification

This paper addresses feature subset selection for Support Vector Machines (SVMs) based on the cross-validation criterion. Unlike statistical criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion…

Optimization and Control · Mathematics 2026-05-11 Masaharu Mori , Shunnosuke Ikeda , Ryuta Tamura , Yuichi Takano , Ryuhei Miyashiro

A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure

Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem…

Machine Learning · Computer Science 2014-11-13 Tofigh Naghibi , Sarah Hoffmann , Beat Pfister

Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation

Neural likelihood estimation methods for simulation-based inference can suffer from performance degradation when the modeled data is very high-dimensional or lies along a lower-dimensional manifold, which is due to the inability of the…

Machine Learning · Statistics 2025-06-12 Simon Dirmeier , Carlo Albert , Fernando Perez-Cruz

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection…

Machine Learning · Computer Science 2021-05-04 Ozan Ozdenizci , Deniz Erdogmus

High-dimensional Penalty Selection via Minimum Description Length Principle

We tackle the problem of penalty selection of regularization on the basis of the minimum description length (MDL) principle. In particular, we consider that the design space of the penalty function is high-dimensional. In this situation,…

Machine Learning · Statistics 2018-04-27 Kohei Miyaguchi , Kenji Yamanishi

Dir-SPGLM: A Bayesian semiparametric GLM with data-driven reference distribution

The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model.…

Methodology · Statistics 2024-04-09 Entejar Alam , Peter Müller , Paul J. Rathouz

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling

Word2Vec is the most popular model for word representation and has been widely investigated in literature. However, its noise distribution for negative sampling is decided by empirical trials and the optimality has always been ignored. We…

Computation and Language · Computer Science 2019-10-22 Wenxiang Jiao , Irwin King , Michael R. Lyu

SubGram: Extending Skip-gram Word Representation with Substrings

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network. The representation gained popularity in various areas of natural language processing, because…

Computation and Language · Computer Science 2020-07-09 Tom Kocmi , Ondřej Bojar

Learning Word Embedding with Better Distance Weighting and Window Size Scheduling

Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on…

Computation and Language · Computer Science 2024-07-30 Chaohao Yang , Chris Ding

Improving Negative Sampling for Word Representation using Self-embedded Features

Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this…

Machine Learning · Computer Science 2018-06-27 Long Chen , Fajie Yuan , Joemon M. Jose , Weinan Zhang