DOI: 10.17798/bitlisfen.1779957 ISSN: 2147-3129

ViTEmIS: Vision Transformer Based Email Image Spam Detector

Sultan Zavrak
Email spam is still a big threat to cybersecurity, and image-based spam makes it harder for traditional text-based spam detection methods to work. As impostors change their attack techniques, the need for creative ways to solve this problem is growing. This work presents ViTEmIS, a ViT-Base fine-tuning–based approach for detecting image-based email spam, together with a systematic analysis of duplicate-image effects on performance reporting across benchmark datasets. In addition, the proposed model is compared with the methods proposed in the previous studies. According to evaluation results, the proposed model shows promising results in terms of accuracy, precision, recall, and F1 on the different datasets. Across four benchmark datasets, ViTEmIS achieved 94.00–99.42% accuracy with duplicates included and 90.85–99.42% after duplicate removal, with corresponding F1 scores of 96.47–99.46% and 93.70–99.44%, respectively. The contributions are: (i) an evaluation of a ViT-Base fine-tuning approach for image-based email spam detection on multiple benchmarks, (ii) a systematic analysis of how duplicate images affect reported performance, and (iii) a comparative discussion with prior work with explicit modality caveats.

More from our Archive