We analyze the Knowledge Neurons framework for the attribution of factual and relational knowledge to particular neurons in the transformer network. We use a 12-layer multi-lingual BERT model for our experiments. Our study reveals various interesting phenomena. We observe that mostly factual knowledge can be attributed to middle and higher layers of the network(≥6). Further analysis reveals that the middle layers(6−9) are mostly responsible for relational information, which is further refined into actual factual knowledge or the "correct answer" in the last few layers(10−12). Our experiments also show that the model handles prompts in different languages, but representing the same fact, similarly, providing further evidence for effectiveness of multi-lingual pre-training. Applying the attribution scheme for grammatical knowledge, we find that grammatical knowledge is far more dispersed among the neurons than factual knowledge.
@article{arxiv.2205.01366,
title = {Finding patterns in Knowledge Attribution for Transformers},
author = {Jeevesh Juneja and Ritu Agarwal},
journal= {arXiv preprint arXiv:2205.01366},
year = {2022}
}