How Language Models Process Negation

Zhejian Zhou; Tianyi Zhou; Robin Jia; Jonathan May

How Language Models Process Negation

Computation and Language 2026-05-06 v1

Authors: Zhejian Zhou , Tianyi Zhou , Robin Jia , Jonathan May

Abstract

We study how Large Language Models (LLMs) process negation mechanistically. First, we establish that even though open-weight models often provide wrong answers to questions involving negation, they do possess internal components that process negation correctly. Their poor accuracy is due to late-layer attention behavior that promotes simple shortcuts; ablating those attention modules greatly improves accuracy on negation-related questions. Second, we uncover how models process negation. We consider two hypotheses: models could use attention heads that attend to the phrase being negated and suppress related concepts, or they could directly construct a representation of the entire negative phrase (e.g., representing "not gas" as a vector that promotes liquids and solids). We apply a range of observational and causal interpretability techniques on Mistral-7B and Llama-3.1-8B to show that models implement both mechanisms, with the "constructive" mechanism being more prominent. Combined, our work deepens the understanding of LLMs' internals, highlighting construction-dominant computations and the coexistence of competing mechanisms within LLMs.

Keywords

language modeling large language model transformer

Cite

@article{arxiv.2605.03052,
  title  = {How Language Models Process Negation},
  author = {Zhejian Zhou and Tianyi Zhou and Robin Jia and Jonathan May},
  journal= {arXiv preprint arXiv:2605.03052},
  year   = {2026}
}

Comments

ICML2026 preprint, camera ready in progress

How Language Models Process Negation

Abstract

Keywords

Cite

Comments

Related papers