English

Conditional Autoregressors are Interpretable Classifiers

Machine Learning 2022-04-01 v1

Abstract

We explore the use of class-conditional autoregressive (CA) models to perform image classification on MNIST-10. Autoregressive models assign probability to an entire input by combining probabilities from each individual feature; hence classification decisions made by a CA can be readily decomposed into contributions from each each input feature. That is to say, CA are inherently locally interpretable. Our experiments show that naively training a CA achieves much worse accuracy compared to a standard classifier, however this is due to over-fitting and not a lack of expressive power. Using knowledge distillation from a standard classifier, a student CA can be trained to match the performance of the teacher while still being interpretable.

Keywords

Cite

@article{arxiv.2203.17002,
  title  = {Conditional Autoregressors are Interpretable Classifiers},
  author = {Nathan Elazar},
  journal= {arXiv preprint arXiv:2203.17002},
  year   = {2022}
}

Comments

4 pages, 2 figures

R2 v1 2026-06-24T10:33:16.458Z