Related papers: Elementwise Language Representation

Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Understanding the locus of semantic representation in large language models (LLMs) is crucial for interpretability and architectural innovation. The dominant paradigm posits that trainable input embeddings serve as foundational "meaning…

Computation and Language · Computer Science 2025-10-16 A. Bochkov

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Models based on the transformer architecture, such as BERT, have marked a crucial step forward in the field of Natural Language Processing. Importantly, they allow the creation of word embeddings that capture important semantic information…

Computation and Language · Computer Science 2021-01-01 Jacob Turton , David Vinson , Robert Elliott Smith

Hash Embeddings for Efficient Word Representations

We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function…

Computation and Language · Computer Science 2017-09-13 Dan Svenstrup , Jonas Meinertz Hansen , Ole Winther

Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery

We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain-specific BERT-based natural language processing model…

Computation and Language · Computer Science 2026-04-30 Yunze Jia , Yuehui Xian , Yangyang Xu , Pengfei Dang , Xiangdong Ding , Jun Sun , Yumei Zhou , Dezhen Xue

Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

Word embedding, a high-dimensional (HD) numerical representation of words generated by machine learning models, has been used for different natural language processing tasks, e.g., translation between two languages. Recently, there has been…

Human-Computer Interaction · Computer Science 2024-03-26 Haoyu Li , Junpeng Wang , Yan Zheng , Liang Wang , Wei Zhang , Han-Wei Shen

Static Word Embeddings for Sentence Semantic Representation

We propose new static word embeddings optimised for sentence semantic representation. We first extract word embeddings from a pre-trained Sentence Transformer, and improve them with sentence-level principal component analysis, followed by…

Computation and Language · Computer Science 2025-10-01 Takashi Wada , Yuki Hirakawa , Ryotaro Shimizu , Takahiro Kawashima , Yuki Saito

Extracting Sentence Embeddings from Pretrained Transformer Models

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in…

Computation and Language · Computer Science 2025-02-21 Lukas Stankevičius , Mantas Lukoševičius

Effective Subword Segmentation for Text Comprehension

Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or…

Computation and Language · Computer Science 2019-06-12 Zhuosheng Zhang , Hai Zhao , Kangwei Ling , Jiangtong Li , Zuchao Li , Shexia He , Guohong Fu

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Word embeddings are fundamental to natural language processing, yet traditional approaches represent each word with a single vector, creating representational bottlenecks for polysemous words and limiting semantic expressiveness. While…

Computation and Language · Computer Science 2026-04-30 Orhan Demirci , Sezer Aptourachman , Aydın Kaya

Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that…

Computation and Language · Computer Science 2023-11-21 Haoran Zhao , Jake Ryland Williams

Interpretable Neural Embeddings with Sparse Self-Representation

Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…

Computation and Language · Computer Science 2023-06-27 Minxue Xia , Hao Zhu

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding.…

Computation and Language · Computer Science 2023-07-06 Sonal Sannigrahi , Josef van Genabith , Cristina Espana-Bonet

Value-Aware Numerical Representations for Transformer Language Models

Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limitation is that numbers are processed as…

Computation and Language · Computer Science 2026-01-15 Andreea Dutulescu , Stefan Ruseti , Mihai Dascalu

SEE: Sememe Entanglement Encoding for Transformer-bases Models Compression

Transformer-based large language models exhibit groundbreaking capabilities, but their storage and computational costs are prohibitively high, limiting their application in resource-constrained scenarios. An effective approach is to…

Machine Learning · Computer Science 2024-12-18 Jing Zhang , Shuzhen Sun , Peng Zhang , Guangxing Cao , Hui Gao , Xindian Ma , Nan Xu , Yuexian Hou

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing…

Computation and Language · Computer Science 2021-09-09 Iker García-Ferrero , Rodrigo Agerri , German Rigau

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words,…

Computation and Language · Computer Science 2016-07-25 Kuan-Yu Chen , Shih-Hung Liu , Berlin Chen , Hsin-Min Wang , Hsin-Hsi Chen

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued,…

Computation and Language · Computer Science 2019-06-21 Dinghan Shen , Pengyu Cheng , Dhanasekar Sundararaman , Xinyuan Zhang , Qian Yang , Meng Tang , Asli Celikyilmaz , Lawrence Carin

All Word Embeddings from One Embedding

In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size.…

Computation and Language · Computer Science 2020-10-26 Sho Takase , Sosuke Kobayashi

Language Features Matter: Effective Language Representations for Vision-Language Tasks

Shouldn't language and vision features be treated equally in vision-language (VL) tasks? Many VL approaches treat the language component as an afterthought, using simple language models that are either built upon fixed word embeddings…

Computer Vision and Pattern Recognition · Computer Science 2019-08-20 Andrea Burns , Reuben Tan , Kate Saenko , Stan Sclaroff , Bryan A. Plummer

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art…

Computation and Language · Computer Science 2020-06-02 Bin Wang , C. -C. Jay Kuo