Related papers: Dialectograms: Machine Learning Differences betwee…

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning…

Computation and Language · Computer Science 2020-05-11 Martina Toshevska , Frosina Stojanovska , Jovan Kalajdjieski

Dynamic Bernoulli Embeddings for Language Evolution

Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. (2016) developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic…

Machine Learning · Statistics 2017-03-24 Maja Rudolph , David Blei

Learning Domain-Sensitive and Sentiment-Aware Word Embeddings

Word embeddings have been widely used in sentiment classification because of their efficacy for semantic representations of words. Given reviews from different domains, some existing methods for word embeddings exploit sentiment…

Computation and Language · Computer Science 2018-05-11 Bei Shi , Zihao Fu , Lidong Bing , Wai Lam

Discovering and Interpreting Biased Concepts in Online Communities

Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases…

Computation and Language · Computer Science 2022-01-26 Xavier Ferrer-Aran , Tom van Nuenen , Natalia Criado , Jose M. Such

Compositional Demographic Word Embeddings

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to…

Computation and Language · Computer Science 2020-11-22 Charles Welch , Jonathan K. Kummerfeld , Verónica Pérez-Rosas , Rada Mihalcea

Machine Translation with Cross-lingual Word Embeddings

Learning word embeddings using distributional information is a task that has been studied by many researchers, and a lot of studies are reported in the literature. On the contrary, less studies were done for the case of multiple languages.…

Computation and Language · Computer Science 2020-04-15 Marco Berlot , Evan Kaplan

The Expressive Power of Word Embeddings

We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and…

Machine Learning · Computer Science 2013-05-31 Yanqing Chen , Bryan Perozzi , Rami Al-Rfou , Steven Skiena

What are the biases in my word embedding?

This paper presents an algorithm for enumerating biases in word embeddings. The algorithm exposes a large number of offensive associations related to sensitive features such as race and gender on publicly available embeddings, including a…

Computation and Language · Computer Science 2019-06-21 Nathaniel Swinger , Maria De-Arteaga , Neil Thomas Heffernan , Mark DM Leiserson , Adam Tauman Kalai

Definition Modeling: Learning to define word embeddings in natural language

Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks. But, these tasks only evaluate lexical semantics indirectly. In this…

Computation and Language · Computer Science 2016-12-02 Thanapon Noraset , Chen Liang , Larry Birnbaum , Doug Downey

A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work

Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This…

Computation and Language · Computer Science 2024-11-12 Brian E. Perron , Kelley A. Rivenburgh , Bryan G. Victor , Zia Qi , Hui Luan

Cultural Cartography with Word Embeddings

Using the frequency of keywords is a classic approach in the formal analysis of text, but has the drawback of glossing over the relationality of word meanings. Word embedding models overcome this problem by constructing a standardized and…

Computers and Society · Computer Science 2021-05-05 Dustin S. Stoltz , Marshall A. Taylor

The Geometry of Culture: Analyzing Meaning through Word Embeddings

We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with…

Computation and Language · Computer Science 2019-11-13 Austin C. Kozlowski , Matt Taddy , James A. Evans

Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?

Investigating linguistic relationships on a global scale requires analyzing diverse features such as syntax, phonology and prosody, which evolve at varying rates influenced by internal diversification, language contact, and sociolinguistic…

Computation and Language · Computer Science 2025-06-11 Tuukka Törö , Antti Suni , Juraj Šimko

Word Embedding for Social Sciences: An Interdisciplinary Survey

To extract essential information from complex data, computer scientists have been developing machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer…

Artificial Intelligence · Computer Science 2024-06-18 Akira Matsui , Emilio Ferrara

Morphological Priors for Probabilistic Neural Word Embeddings

Word embeddings allow natural language processing systems to share statistical information across related words. These embeddings are typically based on distributional statistics, making it difficult for them to generalize to rare or unseen…

Computation and Language · Computer Science 2016-09-27 Parminder Bhatia , Robert Guthrie , Jacob Eisenstein

Identity-sensitive Word Embedding through Heterogeneous Networks

Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In…

Computation and Language · Computer Science 2016-12-04 Jian Tang , Meng Qu , Qiaozhu Mei

Bio-inspired Structure Identification in Language Embeddings

Word embeddings are a popular way to improve downstream performances in contemporary language modeling. However, the underlying geometric structure of the embedding space is not well understood. We present a series of explorations using…

Computation and Language · Computer Science 2020-09-17 Hongwei , Zhou , Oskar Elek , Pranav Anand , Angus G. Forbes

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of…

Computation and Language · Computer Science 2016-07-04 Stéphan Tulkens , Chris Emmery , Walter Daelemans

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Word embeddings are ubiquitous in NLP and information retrieval, but it is unclear what they represent when the word is polysemous. Here it is shown that multiple word senses reside in linear superposition within the word embedding and…

Computation and Language · Computer Science 2018-12-10 Sanjeev Arora , Yuanzhi Li , Yingyu Liang , Tengyu Ma , Andrej Risteski