English

Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models

Computation and Language 2025-10-10 v3 Machine Learning

Abstract

In this paper, we investigate knowledge forgetting in large language models with a focus on its generalisation, ensuring that models forget not only specific training samples but also related implicit knowledge. To this end, we begin by identifying a broader unlearning scope that includes both target data and logically associated samples, including rephrased, subject-replaced, relation-reversed, and one-hop reasoned data. We then conduct a rigorous evaluation of 15 state-of-the-art methods across three datasets, revealing that unlearned models still recall paraphrased answers and retain target facts in their intermediate layers. This motivates us to take a preliminary step toward more generalised implicit knowledge forgetting by proposing PerMU, a novel probability perturbation-based unlearning paradigm. PerMU simulates adversarial unlearning samples to eliminate fact-related tokens from the logit distribution, collectively reducing the probabilities of all answer-associated tokens. Experiments are conducted on a diverse range of datasets, including TOFU, Harry Potter, ZsRE, WMDP, and MUSE, using models ranging from 1.3B to 13B in scale. The results demonstrate that PerMU delivers up to a 50.40% improvement in unlearning vanilla target data while maintaining a 40.73% boost in forgetting implicit knowledge. Our code can be found in https://github.com/MaybeLizzy/PERMU.

Keywords

Cite

@article{arxiv.2502.19982,
  title  = {Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models},
  author = {Huazheng Wang and Yongcheng Jing and Haifeng Sun and Yingjie Wang and Jingyu Wang and Jianxin Liao and Dacheng Tao},
  journal= {arXiv preprint arXiv:2502.19982},
  year   = {2025}
}
R2 v1 2026-06-28T21:59:58.509Z