Related papers: Test-Time Safety Alignment

Test-Time Detoxification without Training or Learning Anything

Large language models can produce toxic or inappropriate text even for benign inputs, creating risks when deployed at scale. Detoxification is therefore important for safety and user trust, particularly when we want to reduce harmful…

Computation and Language · Computer Science 2026-02-04 Baturay Saglam , Dionysis Kalogerias

An Analysis of Semantically-Aligned Speech-Text Embeddings

Embeddings play an important role in end-to-end solutions for multi-modal language processing problems. Although there has been some effort to understand the properties of single-modality embedding spaces, particularly that of text, their…

Computation and Language · Computer Science 2023-01-20 Muhammad Huzaifah , Ivan Kukanov

Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing

Safety alignment is an essential research topic for real-world AI applications. Despite the multifaceted nature of safety and trustworthiness in AI, current safety alignment methods often focus on a comprehensive notion of safety. By…

Artificial Intelligence · Computer Science 2025-02-05 Thien Q. Tran , Akifumi Wachi , Rei Sato , Takumi Tanabe , Youhei Akimoto

Information Leakage in Embedding Models

Embeddings are functions that map raw input data to low-dimensional vector representations, while preserving important semantic information about the inputs. Pre-training embeddings on a large amount of unlabeled data and fine-tuning them…

Machine Learning · Computer Science 2020-08-21 Congzheng Song , Ananth Raghunathan

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

Current research in adversarial robustness of LLMs focuses on discrete input manipulations in the natural language space, which can be directly transferred to closed-source models. However, this approach neglects the steady progression of…

Machine Learning · Computer Science 2025-04-17 Leo Schwinn , David Dobre , Sophie Xhonneux , Gauthier Gidel , Stephan Gunnemann

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as…

Computation and Language · Computer Science 2017-02-27 Cem Safak Sahin , Rajmonda S. Caceres , Brandon Oselio , William M. Campbell

Empirical Evaluation of Embedding Models in the Context of Text Classification in Document Review in Construction Delay Disputes

Text embeddings are numerical representations of text data, where words, phrases, or entire documents are converted into vectors of real numbers. These embeddings capture semantic meanings and relationships between text elements in a…

Information Retrieval · Computer Science 2025-01-20 Fusheng Wei , Robert Neary , Han Qin , Qiang Mao , Jianping Zhang

An Error-Oriented Approach to Word Embedding Pre-Training

We propose a novel word embedding pre-training approach that exploits writing errors in learners' scripts. We compare our method to previous models that tune the embeddings based on script scores and the discrimination between correct and…

Computation and Language · Computer Science 2019-07-05 Youmna Farag , Marek Rei , Ted Briscoe

Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction

Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs. While model editing has been proposed to remove or filter…

Computer Vision and Pattern Recognition · Computer Science 2025-03-06 Jordan Vice , Naveed Akhtar , Mubarak Shah , Richard Hartley , Ajmal Mian

Leveraging Semantic Embeddings for Safety-Critical Applications

Semantic Embeddings are a popular way to represent knowledge in the field of zero-shot learning. We observe their interpretability and discuss their potential utility in a safety-critical context. Concretely, we propose to use them to add…

Machine Learning · Statistics 2019-05-21 Thomas Brunner , Frederik Diehl , Michael Truong Le , Alois Knoll

The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context

Alignment tuning has enabled large language models to excel in reasoning, instruction-following, and minimizing harmful generations. However, despite their widespread deployment, these models exhibit a monolingual bias, raising concerns…

Computation and Language · Computer Science 2025-04-04 Nikhil Verma , Manasa Bharadwaj

Semantic Probabilistic Control of Language Models

Semantic control entails steering LM generations towards satisfying subtle non-lexical constraints, e.g., toxicity, sentiment, or politeness, attributes that can be captured by a sequence-level verifier. It can thus be viewed as sampling…

Machine Learning · Computer Science 2025-05-06 Kareem Ahmed , Catarina G Belem , Padhraic Smyth , Sameer Singh

Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models

Controlling the behavior of text-to-image generative models is critical for safe and practical deployment. Existing safety approaches typically rely on model fine-tuning or curated datasets, which can degrade generation quality or limit…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Yaoteng Tan , Zikui Cai , M. Salman Asif

Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding…

Computation and Language · Computer Science 2018-05-14 Guoyin Wang , Chunyuan Li , Wenlin Wang , Yizhe Zhang , Dinghan Shen , Xinyuan Zhang , Ricardo Henao , Lawrence Carin

A Primer on Word Embeddings: AI Techniques for Text Analysis in Social Work

Word embeddings represent a transformative technology for analyzing text data in social work research, offering sophisticated tools for understanding case notes, policy documents, research literature, and other text-based materials. This…

Computation and Language · Computer Science 2024-11-12 Brian E. Perron , Kelley A. Rivenburgh , Bryan G. Victor , Zia Qi , Hui Luan

Word Embeddings Are Steers for Language Models

Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain…

Computation and Language · Computer Science 2024-06-07 Chi Han , Jialiang Xu , Manling Li , Yi Fung , Chenkai Sun , Nan Jiang , Tarek Abdelzaher , Heng Ji

Linearly Controlled Language Generation with Performative Guarantees

The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To…

Computation and Language · Computer Science 2026-03-16 Emily Cheng , Carmen Amo Alonso

Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging

Text embeddings are vital for tasks such as text retrieval and semantic textual similarity (STS). Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has…

Computation and Language · Computer Science 2024-10-22 Mingxin Li , Zhijie Nie , Yanzhao Zhang , Dingkun Long , Richong Zhang , Pengjun Xie

Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment

Network embeddings, which learn low-dimensional representations for each vertex in a large-scale network, have received considerable attention in recent years. For a wide range of applications, vertices in a network are typically…

Computation and Language · Computer Science 2018-08-30 Dinghan Shen , Xinyuan Zhang , Ricardo Henao , Lawrence Carin

Static Word Embeddings for Sentence Semantic Representation

We propose new static word embeddings optimised for sentence semantic representation. We first extract word embeddings from a pre-trained Sentence Transformer, and improve them with sentence-level principal component analysis, followed by…

Computation and Language · Computer Science 2025-10-01 Takashi Wada , Yuki Hirakawa , Ryotaro Shimizu , Takahiro Kawashima , Yuki Saito