Related papers: Characterizing Linear Alignment Across Language Mo…

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conceptual framework where representations…

Machine Learning · Computer Science 2025-06-06 Femi Bello , Anubrata Das , Fanzhi Zeng , Fangcong Yin , Liu Leqi

Large Language Models Encode Semantics and Alignment in Linearly Separable Representations

Understanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. Yet it remains unclear to what extent LLMs linearly organize representations related to semantic…

Computation and Language · Computer Science 2026-01-22 Baturay Saglam , Paul Kassianik , Blaine Nelson , Sajana Weerawardhena , Yaron Singer , Amin Karbasi

Layer by Layer: Uncovering Hidden Representations in Language Models

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on the final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that…

Machine Learning · Computer Science 2025-06-17 Oscar Skean , Md Rifat Arefin , Dan Zhao , Niket Patel , Jalal Naghiyev , Yann LeCun , Ravid Shwartz-Ziv

Linear Relational Decoding of Morphology in Language Models

A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations. Adapting the Bigger Analogy Test Set, we show that the linear transformation Ws, where s is a…

Computation and Language · Computer Science 2025-07-22 Eric Xia , Jugal Kalita

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like…

Computation and Language · Computer Science 2021-12-28 Mikel Artetxe , Gorka Labaka , Iñigo Lopez-Gazpio , Eneko Agirre

Emergence of Linear Truth Encodings in Language Models

Recent probing studies reveal that large language models exhibit linear subspaces that separate true from false statements, yet the mechanism behind their emergence is unclear. We introduce a transparent, one-layer transformer toy model…

Computation and Language · Computer Science 2025-10-20 Shauli Ravfogel , Gilad Yehudai , Tal Linzen , Joan Bruna , Alberto Bietti

Learning Obfuscations Of LLM Embedding Sequences: Stained Glass Transform

The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the…

Machine Learning · Computer Science 2025-06-12 Jay Roberts , Kyle Mylonakis , Sidhartha Roy , Kaan Kale

Selective Pre-training for Private Fine-tuning

Text prediction models, when used in applications like email clients or word processors, must protect user data privacy and adhere to model size constraints. These constraints are crucial to meet memory and inference time requirements, as…

Machine Learning · Computer Science 2024-07-03 Da Yu , Sivakanth Gopi , Janardhan Kulkarni , Zinan Lin , Saurabh Naik , Tomasz Lukasz Religa , Jian Yin , Huishuai Zhang

Recovering from Privacy-Preserving Masking with Large Language Models

Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where…

Computation and Language · Computer Science 2023-12-15 Arpita Vats , Zhe Liu , Peng Su , Debjyoti Paul , Yingyi Ma , Yutong Pang , Zeeshan Ahmed , Ozlem Kalinli

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in…

Computation and Language · Computer Science 2024-04-26 Zhihao Zhu , Ninglu Shao , Defu Lian , Chenwang Wu , Zheng Liu , Yi Yang , Enhong Chen

On the Calibration of Large Language Models and Alignment

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of…

Computation and Language · Computer Science 2023-11-23 Chiwei Zhu , Benfeng Xu , Quan Wang , Yongdong Zhang , Zhendong Mao

Emerging Cross-lingual Structure in Pretrained Language Models

We study the problem of multilingual masked language modeling, i.e. the training of a single model on concatenated text from multiple languages, and present a detailed study of several factors that influence why these models are so…

Computation and Language · Computer Science 2020-05-11 Shijie Wu , Alexis Conneau , Haoran Li , Luke Zettlemoyer , Veselin Stoyanov

Privacy Preserving In-Context-Learning Framework for Large Language Models

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information…

Machine Learning · Computer Science 2025-11-20 Bishnu Bhusal , Manoj Acharya , Ramneet Kaur , Colin Samplawski , Anirban Roy , Adam D. Cobb , Rohit Chadha , Susmit Jha

Language-Specific Latent Process Hinders Cross-Lingual Performance

Large language models (LLMs) are demonstrably capable of cross-lingual transfer, but can produce inconsistent output when prompted with the same queries written in different languages. To understand how language models are able to…

Computation and Language · Computer Science 2025-09-29 Zheng Wei Lim , Alham Fikri Aji , Trevor Cohn

Privately Aligning Language Models with Reinforcement Learning

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we…

Machine Learning · Computer Science 2024-05-06 Fan Wu , Huseyin A. Inan , Arturs Backurs , Varun Chandrasekaran , Janardhan Kulkarni , Robert Sim

Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions

The rapid advancement of large language models (LLMs) has revolutionized natural language processing, enabling applications in diverse domains such as healthcare, finance and education. However, the growing reliance on extensive data for…

Cryptography and Security · Computer Science 2024-12-10 Guoshenghui Zhao , Eric Song

Differentially Private Distributed Learning for Language Modeling Tasks

One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and…

Computation and Language · Computer Science 2018-03-07 Vadim Popov , Mikhail Kudinov , Irina Piontkovskaya , Petr Vytovtov , Alex Nevidomsky

How Private is Your Attention? Bridging Privacy with In-Context Learning

In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms…

Machine Learning · Statistics 2025-04-23 Soham Bonnerjee , Zhen Wei , Yeon , Anna Asch , Sagnik Nandy , Promit Ghosal

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy…

Cryptography and Security · Computer Science 2025-02-11 Michele Miranda , Elena Sofia Ruzzetti , Andrea Santilli , Fabio Massimo Zanzotto , Sébastien Bratières , Emanuele Rodolà

CAPE: Context-Aware Private Embeddings for Private Language Learning

Deep learning-based language models have achieved state-of-the-art results in a number of applications including sentiment analysis, topic labelling, intent classification and others. Obtaining text representations or embeddings using these…

Computation and Language · Computer Science 2021-08-30 Richard Plant , Dimitra Gkatzia , Valerio Giuffrida