DOI: 10.1002/ima.70406 ISSN: 0899-9457

MSCRTU : A Convolutional Retentive Transformer With Multiscale Fusion for Breast Ultrasound Image Classification

Kaicheng Lin, Bo Xu, Ying Wu

ABSTRACT

Accurate classification of breast ultrasound images (BUSI) is crucial for the early diagnosis of breast cancer. Current deep learning models still have limitations in this task, primarily due to the insufficient global modeling capabilities of convolutional neural networks (CNNs) and the lack of local detail and spatial inductive bias in vision transformers (ViTs). Additionally, how to effectively fuse multiscale features to address lesions of varying sizes remains a major challenge in this field. To address these issues, this paper proposes a novel multiscale convolution retentive transformer (MS‐CRTU) network. The network first uses a CNN module to extract key local texture features from ultrasound images. Subsequently, to synergistically model local and global information from shallow to deep levels, the feature maps are fed into the core processing stage composed of our designed convolution‐retentive transformer block (CRT block). This block interacts with information through convolutional operations and a novel Manhattan self‐attention (MaSA) mechanism. Finally, to dynamically aggregate the most informative features, a selective fusion module (MSF) integrates the multiscale features from all stages for the final classification task. On the public BUSI and our B‐UCLM datasets, the MS‐CRTU model achieves accuracies of 95.38% and 93.85% respectively, outperforming baseline models in F1 score and area under the curve (AUC). This study confirms that the proposed MS‐CRTU enhances the accuracy and robustness of BUSI classification, offering a new approach for intelligent diagnosis of BUSIs.

More from our Archive