Related papers: Understanding Language Model Circuits through Know…

Knowledge Circuits in Pretrained Transformers

The remarkable capabilities of modern large language models are rooted in their vast repositories of knowledge encoded within their parameters, enabling them to perceive the world and engage in reasoning. The inner workings of how these…

Computation and Language · Computer Science 2025-01-06 Yunzhi Yao , Ningyu Zhang , Zekun Xi , Mengru Wang , Ziwen Xu , Shumin Deng , Huajun Chen

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform more complex tasks. Recent advances in…

Machine Learning · Computer Science 2025-06-24 Philipp Mondorf , Sondre Wold , Barbara Plank

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning

Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have…

Artificial Intelligence · Computer Science 2025-02-17 Lin Zhang , Lijie Hu , Di Wang

Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models

While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called…

Computation and Language · Computer Science 2024-10-08 Michael Lan , Philip Torr , Fazl Barez

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural…

Machine Learning · Computer Science 2025-06-03 Yixin Ou , Yunzhi Yao , Ningyu Zhang , Hui Jin , Jiacheng Sun , Shumin Deng , Zhenguo Li , Huajun Chen

Discursive Circuits: How Do Language Models Understand Discourse Relations?

Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler…

Computation and Language · Computer Science 2025-10-14 Yisong Miao , Min-Yen Kan

Diagnosing Model Editing via Knowledge Spectrum

Model editing, the process of efficiently modifying factual knowledge in pre-trained language models, is critical for maintaining their accuracy and relevance. However, existing editing methods often introduce unintended side effects,…

Computation and Language · Computer Science 2025-09-23 Tsung-Hsuan Pan , Chung-Chi Chen , Hen-Hsen Huang , Hsin-Hsi Chen

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these…

Artificial Intelligence · Computer Science 2024-10-25 Qi Li , Xiang Liu , Zhenheng Tang , Peijie Dong , Zeyu Li , Xinglin Pan , Xiaowen Chu

Mechanistic Circuit-Based Knowledge Editing in Large Language Models

Deploying Large Language Models (LLMs) in real-world dynamic environments raises the challenge of updating their pre-trained knowledge. While existing knowledge editing methods can reliably patch isolated facts, they frequently suffer from…

Computation and Language · Computer Science 2026-04-08 Tianyi Zhao , Yinhan He , Wendy Zheng , Chen Chen

Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference

Recent studies on reasoning in language models (LMs) have sparked a debate on whether they can learn systematic inferential principles or merely exploit superficial patterns in the training data. To understand and uncover the mechanisms…

Computation and Language · Computer Science 2025-06-24 Geonhee Kim , Marco Valentino , André Freitas

Inspecting the concept knowledge graph encoded by modern language models

The field of natural language understanding has experienced exponential progress in the last few years, with impressive results in several tasks. This success has motivated researchers to study the underlying knowledge encoded by these…

Artificial Intelligence · Computer Science 2021-06-03 Carlos Aspillaga , Marcelo Mendoza , Alvaro Soto

Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting

Large language models (LLMs) can effectively handle outdated information through knowledge editing. However, current approaches face two key limitations: (I) Poor generalization: Most approaches rigidly inject new knowledge without ensuring…

Computation and Language · Computer Science 2026-04-08 Jinhu Fu , Yan Bai , Longzhu He , Yihang Lou , Yanxiao Zhao , Li Sun , Sen Su

The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights

Locating and editing knowledge in large language models (LLMs) is crucial for enhancing their accuracy, safety, and inference rationale. We introduce ``concept editing'', an innovative variation of knowledge editing that uncovers…

Computation and Language · Computer Science 2024-08-23 Nura Aljaafari , Danilo S. Carvalho , André Freitas

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing

Knowledge editing, which aims to update the knowledge encoded in language models, can be deceptive. Despite the fact that many existing knowledge editing algorithms achieve near-perfect performance on conventional metrics, the models edited…

Computation and Language · Computer Science 2025-05-20 Jiakuan Xie , Pengfei Cao , Yubo Chen , Kang Liu , Jun Zhao

Linguistic Interpretability of Transformer-based Language Models: a systematic review

Language models based on the Transformer architecture achieve excellent results in many language-related tasks, such as text classification or sentiment analysis. However, despite the architecture of these models being well-defined, little…

Computation and Language · Computer Science 2025-04-14 Miguel López-Otal , Jorge Gracia , Jordi Bernad , Carlos Bobed , Lucía Pitarch-Ballesteros , Emma Anglés-Herrero

Towards a Principled Evaluation of Knowledge Editors

Model editing has been gaining increasing attention over the past few years. For Knowledge Editing in particular, more challenging evaluation datasets have recently been released. These datasets use different methodologies to score the…

Computation and Language · Computer Science 2025-07-09 Sebastian Pohl , Max Ploner , Alan Akbik

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

Knowledge Editing (KE) enables the modification of outdated or incorrect information in large language models (LLMs). While existing KE methods can update isolated facts, they often fail to generalize these updates to multi-hop reasoning…

Computation and Language · Computer Science 2025-11-21 Yunzhi Yao , Jizhan Fang , Jia-Chen Gu , Ningyu Zhang , Shumin Deng , Huajun Chen , Nanyun Peng

A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions

Large language models (LLMs) acquire vast knowledge from large text corpora, but this information can become outdated or inaccurate. Since retraining is computationally expensive, knowledge editing offers an efficient alternative --…

Artificial Intelligence · Computer Science 2025-08-13 Amir Mohammad Salehoof , Ali Ramezani , Yadollah Yaghoobzadeh , Majid Nili Ahmadabadi

Uncovering Intermediate Variables in Transformers using Circuit Probing

Neural network models have achieved high performance on a wide variety of complex tasks, but the algorithms that they implement are notoriously difficult to interpret. It is often necessary to hypothesize intermediate variables involved in…

Computation and Language · Computer Science 2025-02-13 Michael A. Lepori , Thomas Serre , Ellie Pavlick

A Comprehensive Study of Knowledge Editing for Large Language Models

Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training,…

Computation and Language · Computer Science 2024-11-19 Ningyu Zhang , Yunzhi Yao , Bozhong Tian , Peng Wang , Shumin Deng , Mengru Wang , Zekun Xi , Shengyu Mao , Jintian Zhang , Yuansheng Ni , Siyuan Cheng , Ziwen Xu , Xin Xu , Jia-Chen Gu , Yong Jiang , Pengjun Xie , Fei Huang , Lei Liang , Zhiqiang Zhang , Xiaowei Zhu , Jun Zhou , Huajun Chen