English

Eigenpruning: an Interpretability-Inspired PEFT Method

Machine Learning 2024-06-21 v5 Artificial Intelligence

Abstract

We introduce eigenpruning, a method that removes singular values from weight matrices in an LLM to improve its performance in a particular task. This method is inspired by interpretability methods designed to automatically find subnetworks of a model which solve a specific task. In our tests, the pruned model outperforms the original model by a large margin, while only requiring minimal computation to prune the weight matrices. In the case of a small synthetic task in integer multiplication, the Phi-2 model can improve its accuracy in the test set from 13.75% to 97.50%. Interestingly, these results seem to indicate the existence of a computation path that can solve the task very effectively, but it was not being used by the original model. Finally, we publicly release our implementation.

Keywords

Cite

@article{arxiv.2404.03147,
  title  = {Eigenpruning: an Interpretability-Inspired PEFT Method},
  author = {Tomás Vergara-Browne and Álvaro Soto and Akiko Aizawa},
  journal= {arXiv preprint arXiv:2404.03147},
  year   = {2024}
}

Comments

Extended abstract accepted to LatinX at NAACL 2024

R2 v1 2026-06-28T15:43:38.982Z