DOI: 10.3390/agriculture15020138 ISSN: 2077-0472

A Patch-Level Data Synthesis Pipeline Enhances Species-Level Crop and Weed Segmentation in Natural Agricultural Scenes

Tang Li, James Burridge, Pieter M. Blok, Wei Guo

Species-level crop and weed semantic segmentation in agricultural field images enables plant identification and enhanced precision weed management. However, the scarcity of labeled data poses significant challenges for model development. Here, we report a patch-level synthetic data generation pipeline that improves semantic segmentation performance in natural agriculture scenes by creating realistic training samples, achieved by pasting patches of segmented plants onto soil backgrounds. This pipeline effectively preserves foreground context and ensures diverse and accurate samples, thereby enhancing model generalization. The semantic segmentation performance of the baseline model was higher when trained solely on data synthesized by our proposed method compared to training solely on real data, with an approximate increase in the mean intersection over union (mIoU) by approximately 1.1% (from 0.626 to 0.633). Building on this, we created hybrid datasets by combining synthetic and real data and investigated the impact of synthetic data volume. By increasing the number of synthetic images in these hybrid datasets from 1× to 20×, we observed a substantially performance improvement, with mIoU increasing by 15% at 15×. However, the gains diminish beyond this point, with the optimal balance between accuracy and efficiency achieved at 10×. These findings highlight synthetic data as a scalable and effective augmentation strategy for addressing the challenges of limited labeled data in agriculture.

More from our Archive