English
Related papers

Related papers: Characterizing Linear Alignment Across Language Mo…

200 papers

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conceptual framework where representations…

Machine Learning · Computer Science 2025-06-06 Femi Bello , Anubrata Das , Fanzhi Zeng , Fangcong Yin , Liu Leqi

Understanding the latent space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment. Yet it remains unclear to what extent LLMs linearly organize representations related to semantic…

Computation and Language · Computer Science 2026-01-22 Baturay Saglam , Paul Kassianik , Blaine Nelson , Sajana Weerawardhena , Yaron Singer , Amin Karbasi

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on the final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that…

Machine Learning · Computer Science 2025-06-17 Oscar Skean , Md Rifat Arefin , Dan Zhao , Niket Patel , Jalal Naghiyev , Yann LeCun , Ravid Shwartz-Ziv

A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations. Adapting the Bigger Analogy Test Set, we show that the linear transformation Ws, where s is a…

Computation and Language · Computer Science 2025-07-22 Eric Xia , Jugal Kalita

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like…

Computation and Language · Computer Science 2021-12-28 Mikel Artetxe , Gorka Labaka , Iñigo Lopez-Gazpio , Eneko Agirre

Recent probing studies reveal that large language models exhibit linear subspaces that separate true from false statements, yet the mechanism behind their emergence is unclear. We introduce a transparent, one-layer transformer toy model…

Computation and Language · Computer Science 2025-10-20 Shauli Ravfogel , Gilad Yehudai , Tal Linzen , Joan Bruna , Alberto Bietti

The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models (LLMs) has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the…

Machine Learning · Computer Science 2025-06-12 Jay Roberts , Kyle Mylonakis , Sidhartha Roy , Kaan Kale

Text prediction models, when used in applications like email clients or word processors, must protect user data privacy and adhere to model size constraints. These constraints are crucial to meet memory and inference time requirements, as…

Machine Learning · Computer Science 2024-07-03 Da Yu , Sivakanth Gopi , Janardhan Kulkarni , Zinan Lin , Saurabh Naik , Tomasz Lukasz Religa , Jian Yin , Huishuai Zhang

Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where…

Computation and Language · Computer Science 2023-12-15 Arpita Vats , Zhe Liu , Peng Su , Debjyoti Paul , Yingyi Ma , Yutong Pang , Zeeshan Ahmed , Ozlem Kalinli

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in…

Computation and Language · Computer Science 2024-04-26 Zhihao Zhu , Ninglu Shao , Defu Lian , Chenwang Wu , Zheng Liu , Yi Yang , Enhong Chen

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of…

Computation and Language · Computer Science 2023-11-23 Chiwei Zhu , Benfeng Xu , Quan Wang , Yongdong Zhang , Zhendong Mao

We study the problem of multilingual masked language modeling, i.e. the training of a single model on concatenated text from multiple languages, and present a detailed study of several factors that influence why these models are so…

Computation and Language · Computer Science 2020-05-11 Shijie Wu , Alexis Conneau , Haoran Li , Luke Zettlemoyer , Veselin Stoyanov

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information…

Machine Learning · Computer Science 2025-11-20 Bishnu Bhusal , Manoj Acharya , Ramneet Kaur , Colin Samplawski , Anirban Roy , Adam D. Cobb , Rohit Chadha , Susmit Jha

Large language models (LLMs) are demonstrably capable of cross-lingual transfer, but can produce inconsistent output when prompted with the same queries written in different languages. To understand how language models are able to…

Computation and Language · Computer Science 2025-09-29 Zheng Wei Lim , Alham Fikri Aji , Trevor Cohn

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we…

Machine Learning · Computer Science 2024-05-06 Fan Wu , Huseyin A. Inan , Arturs Backurs , Varun Chandrasekaran , Janardhan Kulkarni , Robert Sim

The rapid advancement of large language models (LLMs) has revolutionized natural language processing, enabling applications in diverse domains such as healthcare, finance and education. However, the growing reliance on extensive data for…

Cryptography and Security · Computer Science 2024-12-10 Guoshenghui Zhao , Eric Song

One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and…

Computation and Language · Computer Science 2018-03-07 Vadim Popov , Mikhail Kudinov , Irina Piontkovskaya , Petr Vytovtov , Alex Nevidomsky

In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms…

Machine Learning · Statistics 2025-04-23 Soham Bonnerjee , Zhen Wei , Yeon , Anna Asch , Sagnik Nandy , Promit Ghosal

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy…

Cryptography and Security · Computer Science 2025-02-11 Michele Miranda , Elena Sofia Ruzzetti , Andrea Santilli , Fabio Massimo Zanzotto , Sébastien Bratières , Emanuele Rodolà

Deep learning-based language models have achieved state-of-the-art results in a number of applications including sentiment analysis, topic labelling, intent classification and others. Obtaining text representations or embeddings using these…

Computation and Language · Computer Science 2021-08-30 Richard Plant , Dimitra Gkatzia , Valerio Giuffrida
‹ Prev 1 2 3 10 Next ›