English

PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis

Computer Vision and Pattern Recognition 2025-06-16 v2 Artificial Intelligence

Abstract

Background and Objective: Prototype-based methods improve interpretability by learning fine-grained part-prototypes; however, their visualization in the input pixel space is not always consistent with human-understandable biomarkers. In addition, well-known prototype-based approaches typically learn extremely granular prototypes that are less interpretable in medical imaging, where both the presence and extent of biomarkers and lesions are critical. Methods: To address these challenges, we propose PiPViT (Patch-based Visual Interpretable Prototypes), an inherently interpretable prototypical model for image recognition. Leveraging a vision transformer (ViT), PiPViT captures long-range dependencies among patches to learn robust, human-interpretable prototypes that approximate lesion extent only using image-level labels. Additionally, PiPViT benefits from contrastive learning and multi-resolution input processing, which enables effective localization of biomarkers across scales. Results: We evaluated PiPViT on retinal OCT image classification across four datasets, where it achieved competitive quantitative performance compared to state-of-the-art methods while delivering more meaningful explanations. Moreover, quantitative evaluation on a hold-out test set confirms that the learned prototypes are semantically and clinically relevant. We believe PiPViT can transparently explain its decisions and assist clinicians in understanding diagnostic outcomes. Github page: https://github.com/marziehoghbaie/PiPViT

Keywords

Cite

@article{arxiv.2506.10669,
  title  = {PiPViT: Patch-based Visual Interpretable Prototypes for Retinal Image Analysis},
  author = {Marzieh Oghbaie and Teresa Araújo and Hrvoje Bogunović},
  journal= {arXiv preprint arXiv:2506.10669},
  year   = {2025}
}
R2 v1 2026-07-01T03:13:20.606Z