English

Patch DCT vs LeNet

Computer Vision and Pattern Recognition 2022-11-07 v1

Abstract

This paper compares the performance of a NN taking the output of a DCT (Discrete Cosine Transform) of an image patch with leNet for classifying MNIST hand written digits. The basis functions underlying the DCT bear a passing resemblance to some of the learned basis function of the Visual Transformer but are an order of magnitude faster to apply.

Cite

@article{arxiv.2211.02392,
  title  = {Patch DCT vs LeNet},
  author = {David Sinclair},
  journal= {arXiv preprint arXiv:2211.02392},
  year   = {2022}
}

Comments

3 pages, 5 figures, appendix for pytorch code defn. Paper argues basis functions are close to as good as convolution nets and that learning custom basis function on large datasets is just pissing away electricity