Multiscale Dynamic Attention and Hierarchical Spatial Aggregation for Few-Shot Object Detection
Yining An, Chunlin SongFew-shot object detection (FSOD) remains a critical challenge in computer vision, where the limited training data significantly hinder model performance. Existing methods suffer from poor robustness and accuracy, primarily due to scale sparsity and inadequate feature extraction. In this paper, we propose MDA-HAPP, a novel framework built on a transfer learning architecture and a two-stage object detection approach, specifically designed to address these issues. The key innovations of MDA-HAPP include 1. MultiScale-DynaAttention, a novel attention module that enhances feature extraction by integrating multi scale convolutions into channel attention and applying a dynamic pooling ratio to spatial attention, with residual connections to improve robustness; 2. hierarchical adaptive-pyramid pooling, designed based on a spatial pyramid pooling (SPP) structure, extracts multiscale features from intermediate layers and dynamically adjusts pooling strategies. These features are then fed into a dual-branch detection head for comprehensive results.The experimental results on the PASCAL VOC and COCO datasets show that MDA-HAPP achieves significant improvements across different K-shot settings. Specifically, the model demonstrates an up to 9.8% gain in AP75 on PASCAL VOC for K-shot values of 10 and an up to 3.7% improvement on COCO for K-shot values of 30. These results confirm its superior performance in FSOD and highlight its potential for real-world applications.