Related papers: Word-level Lexical Normalisation using Context-Dep…

Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models

Text normalization is an important enabling technology for several NLP tasks. Recently, neural-network-based approaches have outperformed well-established models in this task. However, in languages other than English, there has been little…

Computation and Language · Computer Science 2018-09-06 Daniel Watson , Nasser Zalmout , Nizar Habash

Towards Computationally Verifiable Semantic Grounding for Language Models

The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds…

Computation and Language · Computer Science 2022-11-17 Chris Alberti , Kuzman Ganchev , Michael Collins , Sebastian Gehrmann , Ciprian Chelba

Revisiting Word Embeddings in the LLM Era

Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models…

Computation and Language · Computer Science 2025-03-04 Yash Mahajan , Matthew Freestone , Sathyanarayanan Aakur , Santu Karmaker

Revisiting Word Embeddings in the LLM Era

Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models…

Computation and Language · Computer Science 2025-03-04 Yash Mahajan , Matthew Freestone , Naman Bansal , Sathyanarayanan Aakur , Shubhra Kanti Karmaker Santu

A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Learning word embeddings has received a significant amount of attention recently. Often, word embeddings are learned in an unsupervised manner from a large collection of text. The genre of the text typically plays an important role in the…

Computation and Language · Computer Science 2019-02-04 Wei Yang , Wei Lu , Vincent W. Zheng

Language Models with Pre-Trained (GloVe) Word Embeddings

In this work we implement a training of a Language Model (LM), using Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by Pennigton et al. in [1]. The implementation is following the general idea of training RNNs for LM…

Computation and Language · Computer Science 2017-02-07 Victor Makarenkov , Bracha Shapira , Lior Rokach

Deep Learning and Word Embeddings for Tweet Classification for Crisis Response

Tradition tweet classification models for crisis response focus on convolutional layers and domain-specific word embeddings. In this paper, we study the application of different neural networks with general-purpose and domain-specific word…

Computation and Language · Computer Science 2019-03-27 Reem ALRashdi , Simon O'Keefe

GEAR: A Simple GENERATE, EMBED, AVERAGE AND RANK Approach for Unsupervised Reverse Dictionary

Reverse Dictionary (RD) is the task of obtaining the most relevant word or set of words given a textual description or dictionary definition. Effective RD methods have applications in accessibility, translation or writing support systems.…

Computation and Language · Computer Science 2024-12-10 Fatemah Almeman , Luis Espinosa-Anke

Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change

Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability, i.e., the language model pre-trained on static data from past years performs worse over time on emerging data. Existing…

Computation and Language · Computer Science 2022-11-01 Zhaochen Su , Zecheng Tang , Xinyan Guan , Juntao Li , Lijun Wu , Min Zhang

Lexicon Infused Phrase Embeddings for Named Entity Resolution

Most state-of-the-art approaches for named-entity recognition (NER) use semi supervised information in the form of word clusters and lexicons. Recently neural network-based language models have been explored, as they as a byproduct generate…

Computation and Language · Computer Science 2014-04-23 Alexandre Passos , Vineet Kumar , Andrew McCallum

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Pre-trained language models such as BERT have been proved to be powerful in many natural language processing tasks. But in some text classification applications such as emotion recognition and sentiment analysis, BERT may not lead to…

Computation and Language · Computer Science 2025-06-03 Zixiao Zhu , Kezhi Mao

Visual Grounding Helps Learn Word Meanings in Low-Data Regimes

Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension, and their internal representations are remarkably well-aligned with representations of language in the human brain. But to…

Computation and Language · Computer Science 2024-03-27 Chengxu Zhuang , Evelina Fedorenko , Jacob Andreas

LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution

Lexical substitution is the task of generating meaningful substitutes for a word in a given textual context. Contextual word embedding models have achieved state-of-the-art results in the lexical substitution task by relying on contextual…

Machine Learning · Computer Science 2022-04-04 George Michalopoulos , Ian McKillop , Alexander Wong , Helen Chen

Robust Spoken Language Understanding via Paraphrasing

Learning intents and slot labels from user utterances is a fundamental step in all spoken language understanding (SLU) and dialog systems. State-of-the-art neural network based methods, after deployment, often suffer from performance…

Computation and Language · Computer Science 2018-09-19 Avik Ray , Yilin Shen , Hongxia Jin

Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness

Large language models (LLMs) trained with canonical tokenization exhibit surprising robustness to non-canonical inputs such as character-level tokenization, yet the mechanisms underlying this robustness remain unclear. We study this…

Computation and Language · Computer Science 2026-03-12 Zhipeng Yang , Shu Yang , Lijie Hu , Di Wang

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of…

Computation and Language · Computer Science 2018-06-12 Abbas Ghaddar , Philippe Langlais

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of…

Computation and Language · Computer Science 2021-10-13 Ana-Maria Bucur , Adrian Cosma , Liviu P. Dinu

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as…

Computation and Language · Computer Science 2017-02-27 Cem Safak Sahin , Rajmonda S. Caceres , Brandon Oselio , William M. Campbell

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visits foundational concepts such as the…

Computation and Language · Computer Science 2025-12-03 Charles Zhang , Benji Peng , Xintian Sun , Qian Niu , Junyu Liu , Keyu Chen , Ming Li , Pohsun Feng , Ziqian Bi , Ming Liu , Yichao Zhang , Xinyuan Song , Cheng Fei , Caitlyn Heqi Yin , Lawrence KQ Yan , Hongyang He , Tianyang Wang

Character n-gram Embeddings to Improve RNN Language Models

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our…

Computation and Language · Computer Science 2019-06-14 Sho Takase , Jun Suzuki , Masaaki Nagata