A State Space Model-Driven Feature Disentanglement Network for Real-Time Detection of Morphologically Complex Insect Pests in Agricultural Fields
Jiaren Sun, Yating Jiang, Shuai Teng, Zongchao Liu, Nuo ChenAccurate detection of field insect pests remains a significant challenge for precision agriculture due to the elongated and variable morphology of the target organisms, their frequent resemblance to complex background textures, and the long-tail distribution of species in natural datasets. While deep convolutional neural networks (CNNs) have advanced the field, they are often constrained by a limited effective receptive field and the entanglement of semantic and spatial features, which can lead to elevated false-positive rates and missed detections for low-contrast or rare targets. This paper introduces a novel detection framework that integrates state space modeling with multi-stream feature disentanglement to address these limitations. First, a visual state space module is employed as the backbone feature extractor, enabling the establishment of a global receptive field with linear computational complexity and thereby improving the perception of long-range morphological structures. Second, a Topological Feature Disentanglement Pyramid Network is proposed. This architecture explicitly separates feature representations into semantic and spatial streams and recombines them through graph convolutional interactions, which serves to suppress background interference and enhance localization precision. A meta-auxiliary detection head, active only during training, is introduced to amplify supervision signals for hard, low-contrast samples via adversarial gradient modulation. Furthermore, an implicit neural radiance field augmentation pipeline is used to generate physically consistent synthetic views of underrepresented pest classes, mitigating the negative effects of long-tail data distributions. Experimental evaluations on the public BAU-Insectv2 benchmark demonstrate that the proposed method achieves a mean average precision (mAP@0.5) of 81.8%, representing a 4.4-percentage-point improvement over a comparable baseline, while maintaining a compact parameter count of 2.33 M and an inference speed of 178.6 FPS. The framework exhibits particular efficacy in detecting elongated, minute, and rare pests, suggesting a promising technical approach for real-time, field-based pest surveillance in precision agriculture.