HAFM-Net: Hierarchical Alignment Fusion and Mapping for UAV-Based Misaligned RGB-T Salient Object Detection
Zhijie Zhang, Kaihong Chen, Chen Yang, Shanwen Zhang, Zhen WangIn unmanned aerial vehicle (UAV) scenarios, RGB-T salient object detection faces several challenges, including cross-modal spatial misalignment, redundant multi-scale features, and weak responses of small objects in cluttered backgrounds, which together degrade fusion effectiveness and localization stability in complex environments. To address these issues, we propose a Hierarchical Alignment Fusion and Mapping Network (HAFM-Net), a misalignment-robust fusion framework, for unaligned RGB-T salient object detection. The proposed method does not rely on explicit pixel-level preregistration. Instead, it replaces registration-first preprocessing with implicit feature-domain alignment and misalignment-robust fusion, enabling saliency prediction from unregistered RGB-T inputs. Specifically, we design a hierarchical adjacent-scale interaction mechanism to enhance multi-scale contextual modeling while suppressing cross-scale redundancy. We further develop a Misalignment-Robust Correlation Fusion module to explore cross-modal correlations and enable robust feature interaction under positional variations. In addition, a semantic–spatial complementary enhancement is introduced to promote collaboration between high-level semantic cues and low-level spatial details, thereby improving the representation and boundary localization of small salient objects. Experimental results on the UAV RGB-T 2400 dataset and an additional weakly aligned benchmark demonstrate that HAFM-Net achieves competitive performance and exhibits strong robustness in challenging scenarios, such as blur, illumination variation, small-object cases, and foggy conditions.