An Efficient Cross-Modal Interaction and Dynamic Fusion Network for Multimodal Breast Ultrasound Diagnosis
Xiangqiong Wu, Yin Lan, Lina Han, Peng WangBackground: Multimodal breast ultrasound, including B-mode imaging, color Doppler flow imaging, and elastography, provides complementary information for lesion characterization. However, effectively integrating heterogeneous modalities remains challenging due to inconsistent feature distributions, limited cross-modal interaction, computational cost in existing methods, and sensitivity to noise and missing data. Methods: We presented an efficient Cross-Modal Interaction and Dynamic Fusion Network (CIDFNet) for multimodal breast ultrasound analysis. The framework integrates a multi-scale feature enhancement module to improve modality-specific representations, a cross-modal interaction module to enable early-stage feature exchange across modalities, and a dynamic fusion strategy to adaptively combine modality information based on feature reliability estimation. In addition, an invertible neural network is incorporated to reconstruct missing modality features during training. Results: Experiments on an internal dataset of 248 patients with 1532 images show that CIDFNet obtains an AUC of 85.69%, accuracy of 75.51%, recall of 50.00%, F1-score of 62.50%, and precision of 83.33%, while requiring 49.51 M parameters and 79.79 G FLOPs, respectively. Under a simplified Gaussian noise perturbation setting, performance degradation is observed. Conclusions: CIDFNet presents a framework for multimodal breast ultrasound analysis that reflects a trade-off between performance and computational efficiency.