Related papers: Composable Interventions for Language Models

Towards Unifying Interpretability and Control: Evaluation via Intervention

With the growing complexity and capability of large language models, a need to understand model reasoning has emerged, often motivated by an underlying goal of controlling and aligning models. While numerous interpretability and steering…

Machine Learning · Computer Science 2025-02-12 Usha Bhalla , Suraj Srinivas , Asma Ghandeharioun , Himabindu Lakkaraju

Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring

Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and…

Human-Computer Interaction · Computer Science 2024-04-12 Ashish Sharma , Kevin Rushton , Inna Wanyin Lin , Theresa Nguyen , Tim Althoff

Structured Learning of Compositional Sequential Interventions

We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as "close schools for a month due to a pandemic" or "promote this…

Machine Learning · Statistics 2024-10-31 Jialin Yu , Andreas Koukorinis , Nicolò Colombo , Yuchen Zhu , Ricardo Silva

A Comprehensive Survey of Compression Algorithms for Language Models

How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the…

Computation and Language · Computer Science 2024-01-30 Seungcheol Park , Jaehyeon Choi , Sojin Lee , U Kang

Can Language Models Compose Skills In-Context?

Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the in-context composition ability of language models to perform composite tasks that combine basic skills…

Machine Learning · Computer Science 2025-10-28 Zidong Liu , Zhuoyan Xu , Zhenmei Shi , Yingyu Liang

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these…

Artificial Intelligence · Computer Science 2024-10-25 Qi Li , Xiang Liu , Zhenheng Tang , Peijie Dong , Zeyu Li , Xinglin Pan , Xiaowen Chu

We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text

We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text. We do so by designing interventions that approximate core features of user-generated text and their…

Computation and Language · Computer Science 2025-03-25 Aarohi Srivastava , David Chiang

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various…

Software Engineering · Computer Science 2024-12-19 Giordano d'Aloisio , Luca Traini , Federica Sarro , Antinisca Di Marco

Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning

Causal structure learning is a key problem in many domains. Causal structures can be learnt by performing experiments on the system of interest. We address the largely unexplored problem of designing a batch of experiments that each…

Machine Learning · Computer Science 2021-11-25 Scott Sussex , Andreas Krause , Caroline Uhler

Compositional Demographic Word Embeddings

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to…

Computation and Language · Computer Science 2020-11-22 Charles Welch , Jonathan K. Kummerfeld , Verónica Pérez-Rosas , Rada Mihalcea

Conceptual Contrastive Edits in Textual and Vision-Language Retrieval

As deep learning models grow in complexity, achieving model-agnostic interpretability becomes increasingly vital. In this work, we employ post-hoc conceptual contrastive edits to expose noteworthy patterns and biases imprinted in…

Computation and Language · Computer Science 2025-03-05 Maria Lymperaiou , Giorgos Stamou

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of…

Computation and Language · Computer Science 2016-06-09 Shyam Upadhyay , Manaal Faruqui , Chris Dyer , Dan Roth

Manipulating Transformer-Based Models: Controllability, Steerability, and Robust Interventions

Transformer-based language models excel in NLP tasks, but fine-grained control remains challenging. This paper explores methods for manipulating transformer models through principled interventions at three levels: prompts, activations, and…

Computation and Language · Computer Science 2025-09-08 Faruk Alpay , Taylan Alpay

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is…

Computation and Language · Computer Science 2025-07-22 Anirudh Sundar , Sinead Williamson , Katherine Metcalf , Barry-John Theobald , Skyler Seto , Masha Fedzechkina

How Does Controllability Emerge In Language Models During Pretraining?

Language models can be steered by modifying their internal representations to control concepts such as emotion, style, or truthfulness in generation. However, the conditions for an effective intervention remain unclear and are often…

Machine Learning · Computer Science 2025-08-05 Jianshu She , Xinyue Li , Eric Xing , Zhengzhong Liu , Qirong Ho

Linguistically-Informed Multilingual Instruction Tuning: Is There an Optimal Set of Languages to Tune?

Multilingual language models often perform unevenly across different languages due to limited generalization capabilities for some languages. This issue is significant because of the growing interest in making universal language models that…

Computation and Language · Computer Science 2024-10-11 Gürkan Soykan , Gözde Gül Şahin

A Unified Neural Coherence Model

Recently, neural approaches to coherence modeling have achieved state-of-the-art results in several evaluation tasks. However, we show that most of these models often fail on harder tasks with more realistic application scenarios. In…

Computation and Language · Computer Science 2019-09-04 Han Cheol Moon , Tasnim Mohiuddin , Shafiq Joty , Xu Chi

Towards Consistent Language Models Using Declarative Constraints

Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the…

Databases · Computer Science 2023-12-27 Jasmin Mousavi , Arash Termehchy

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or…

Machine Learning · Computer Science 2024-01-18 Fred Zhang , Neel Nanda

Compositionality decomposed: how do neural networks generalise?

Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be…

Computation and Language · Computer Science 2020-02-25 Dieuwke Hupkes , Verna Dankers , Mathijs Mul , Elia Bruni