Input Modality Ablation for Sustainable Landslide Hazard Management Using U-Net: Fused DEM–Optical vs. Spectral vs. Terrain Representations in a Small-Sample Pilot Study
Walter Chen, Fuan TsaiRapid and accurate landslide mapping is essential for disaster risk reduction and sustainable land management in landslide-prone mountainous regions. This study presents a U-Net semantic segmentation framework for pixel-wise landslide classification in the Laonung Creek Watershed of southern Taiwan using 96 annotated tiles derived from a very high-resolution DEM and SPOT-6 multispectral imagery. An input modality ablation experiment compares four configurations: a fused DEM–optical composite matching the visual input used by the annotators (annotation-coherent input), SPOT-6 natural color imagery, a DEM-derived terrain stack, and a six-channel multi-source stack combining all SPOT-6 bands with slope and curvature. All configurations use an identical EfficientNet-B0 U-Net architecture under a spatially blocked train/validation/test design with a fixed held-out test set of 29 tiles. The multi-source stack achieves the highest test Average Precision (AP) of 0.556 (95% CI: 0.463–0.643), whereas the annotation-coherent fused composite achieves AP = 0.511 (95% CI: 0.404–0.601); overlapping confidence intervals indicate that neither modality is definitively superior at this test-set size. The terrain-only configuration (AP = 0.152) confirms that optical information is essential for reliable delineation. A key methodological finding is that differential encoder–decoder learning rates caused rapid decoder overfitting; matched rates of 10−5 substantially stabilized training and are recommended as a conservative default for small-sample segmentation with pretrained encoders. At matched pixel positions, the best DL model achieves AP comparable to a companion Random Forest (DL: 0.847, RF: 0.824), while producing spatially coherent probability maps that support scalable landslide inventory compilation for sustainable hazard management.