DOI: 10.3390/su18126348 ISSN: 2071-1050

Attention-Enhanced and Multi-Scale Network for Image Tamper Detection and Localization

Yuqin Zhang, Kan Ren

The rapid proliferation of image editing tools poses unprecedented challenges to information sustainability and social trust, as malicious digital forgeries can easily contaminate public discourse, news reporting, and legal forensics. Advanced image editing techniques make image tampering increasingly difficult for the naked eye to recognize, which requires highly accurate methods for detecting and localizing image tampering. In this paper, an end-to-end network model named AEM-Net is proposed. AEM-Net combines RGB and SRM features to enhance the model’s sensitivity to image details and potentially tampered regions through multi-scale feature extraction and fusion. AEM-Net consists of the HRNet-based Multiscale Feature Extraction Module and the Context-Aggregated Pyramid Localization Module (CAPLM). The multi-scale feature extraction module utilizes the Attentional Perceptual Feature Fusion Module to adaptively focus on the anomalous regions. In contrast, the CAPLM utilizes the Expanded Convolutional Feedback Enhancement Module to effectively exploit contextual feature information for achieving pixel-level localization of tampered regions. Experimental results on public benchmark datasets demonstrate that AEM-Net achieves superior performance compared with existing state-of-the-art methods. In particular, AEM-Net achieves an AUC/F1 score of 95.36%/67.19% on CasiaV1, 93.25%/79.75% on Coverage, and 87.36%/66.24% on NIST16, while requiring only 0.09 s to process a single image, demonstrating both high localization accuracy and computational efficiency.

More from our Archive