Classification of Retinal OCT Images on an Imbalanced Dataset Using a Swin Transformer
Paweł Borkowski, Marian Wysocki, Andrzej Grzybowski, Anna Wiśniewska-BorkowskaSmall and severely imbalanced optical coherence tomography (OCT) datasets pose a major challenge for deep learning algorithms while reflecting the reality of smaller ophthalmology centers, where rare retinal pathologies such as retinal artery occlusion (RAO) and vitreomacular interface disease (VID) occur only sporadically. The experiments were performed on the imbalanced OCTDL dataset (seven classes, 2064 images) and on a custom imbalanced subset of OCT-C8 (eight classes) matching the OCTDL class distribution. An extended ablation of eight loss functions was carried out under 10-fold stratified cross-validation (CV). As the loss for the final Swin Transformer Base model, we adopt dual-weighted PolyLoss (DW-PolyLoss), a lightweight modification of PolyLoss in which inverse-frequency class weights are applied symmetrically to both the cross-entropy and the polynomial correction terms. Under 10-fold stratified CV, the mean OCTDL accuracy is 95.63%, and a logit-averaging ensemble of the 10-fold models reaches 96.71% accuracy on OCTDL and 96.21% on a custom imbalanced subset of OCT-C8 constructed to match the OCTDL class distribution. Score-CAM analysis suggests that the model attends to clinically interpretable retinal structures, supporting potential use as a screening-triage tool subject to prospective clinical validation.