Related papers: Backpack Language Models

Character-level Chinese Backpack Language Models

The Backpack is a Transformer alternative shown to improve interpretability in English language modeling by decomposing predictions into a weighted sum of token sense components. However, Backpacks' reliance on token-defined meaning raises…

Computation and Language · Computer Science 2023-10-20 Hao Sun , John Hewitt

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the…

Computation and Language · Computer Science 2024-02-21 Shahar Katz , Yonatan Belinkov , Mor Geva , Lior Wolf

Linguistic Interpretability of Transformer-based Language Models: a systematic review

Language models based on the Transformer architecture achieve excellent results in many language-related tasks, such as text classification or sentiment analysis. However, despite the architecture of these models being well-defined, little…

Computation and Language · Computer Science 2025-04-14 Miguel López-Otal , Jorge Gracia , Jordi Bernad , Carlos Bobed , Lucía Pitarch-Ballesteros , Emma Anglés-Herrero

Controlling Gender Bias in Retrieval via a Backpack Architecture

The presence of social biases in large language models (LLMs) has become a significant concern in AI research. These biases, often embedded in training data, can perpetuate harmful stereotypes and distort decision-making processes. When…

Information Retrieval · Computer Science 2025-11-04 Amirabbas Afzali , Amirreza Velae , Iman Ahmadi , Mohammad Aliannejadi

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be…

Computation and Language · Computer Science 2017-06-22 Massimiliano Mancini , Jose Camacho-Collados , Ignacio Iacobacci , Roberto Navigli

LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond

Distributional semantics based on neural approaches is a cornerstone of Natural Language Processing, with surprising connections to human meaning representation as well. Recent Transformer-based Language Models have proven capable of…

Computation and Language · Computer Science 2022-04-04 Daniel Loureiro , Alípio Mário Jorge , Jose Camacho-Collados

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to…

Computation and Language · Computer Science 2025-05-12 Jack Merullo , Carsten Eickhoff , Ellie Pavlick

Character-based Neural Machine Translation

We introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words. Since word-level information provides a crucial source of bias, our input model composes representations…

Computation and Language · Computer Science 2015-11-17 Wang Ling , Isabel Trancoso , Chris Dyer , Alan W Black

Generalizing Word Embeddings using Bag of Subwords

We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of…

Computation and Language · Computer Science 2018-09-13 Jinman Zhao , Sidharth Mudgal , Yingyu Liang

Multilinguality as Sense Adaptation

We approach multilinguality as sense adaptation: aligning latent meaning representations across languages rather than relying solely on shared parameters and scale. In this paper, we introduce SENse-based Symmetric Interlingual Alignment…

Computation and Language · Computer Science 2026-01-16 Jan Christian Blaise Cruz , David Ifeoluwa Adelani , Alham Fikri Aji

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word representations for multiple languages in a single model. To process multilingual sentences in the model, a learnable vector is usually assigned to each language, which…

Computation and Language · Computer Science 2021-02-17 Shengjie Luo , Kaiyuan Gao , Shuxin Zheng , Guolin Ke , Di He , Liwei Wang , Tie-Yan Liu

sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings

Neural word representations have proven useful in Natural Language Processing (NLP) tasks due to their ability to efficiently model complex semantic and syntactic word relationships. However, most techniques model only one representation…

Computation and Language · Computer Science 2015-11-23 Andrew Trask , Phil Michalak , John Liu

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Visual Comparison of Language Model Adaptation

Neural language models are widely used; however, their model parameters often need to be adapted to the specific domains and tasks of an application, which is time- and resource-consuming. Thus, adapters have recently been introduced as a…

Artificial Intelligence · Computer Science 2022-08-18 Rita Sevastjanova , Eren Cakmak , Shauli Ravfogel , Ryan Cotterell , Mennatallah El-Assady

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick

Contextualized word senses: from attention to compositionality

The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very…

Computation and Language · Computer Science 2023-12-04 Pablo Gamallo

Memorizing Transformers

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus…

Machine Learning · Computer Science 2022-03-18 Yuhuai Wu , Markus N. Rabe , DeLesley Hutchins , Christian Szegedy

Contextually Propagated Term Weights for Document Representation

Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a…

Information Retrieval · Computer Science 2019-06-04 Casper Hansen , Christian Hansen , Stephen Alstrup , Jakob Grue Simonsen , Christina Lioma

A Transformer with Stack Attention

Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in…

Computation and Language · Computer Science 2024-05-15 Jiaoda Li , Jennifer C. White , Mrinmaya Sachan , Ryan Cotterell

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

Learning vectors that capture the meaning of concepts remains a fundamental challenge. Somewhat surprisingly, perhaps, pre-trained language models have thus far only enabled modest improvements to the quality of such concept embeddings.…

Computation and Language · Computer Science 2023-05-18 Na Li , Hanane Kteich , Zied Bouraoui , Steven Schockaert