Multivariate Spatial Characterization and Probabilistic Source Risk Assessment of Soil Heavy Metal Pollution in the Yellow River Basin
Dil Khurram, Tianlie Luo, Jie Tang, Ram Proshad, Sami Ullah, Tianyu He, Nadeem Iqbal, Xin Gao, Mingtan Zhu, Gratien NsabimanaSoil heavy metal pollution poses a threat to agricultural sustainability, food safety, and human health. The ecologically fragile Yellow River Basin is a critical hub for agriculture, energy, and mining; however, soil heavy metal studies remain fragmented, and basin-wide syntheses are limited almost entirely to agricultural soils. This study presents a basin-wide analysis of As, Cd, Cr, Cu, Ni, Pb, and Zn in topsoil, based on 2498 sampling locations compiled from 347 publications, using an integrated framework of receptor modeling, multivariate spatial statistics, self-organizing maps, and probabilistic human health and ecological risk assessment. Four pollution sources, namely agricultural–industrial, emissions, mining–smelting, and geogenic/lithogenic, were resolved. Agriculture–industry and emissions posed considerable ecological risks (mean PER = 367.9 and 353.4), with Cd and Pb accounting for 95.7% of the risk. The non-carcinogenic hazard was negligible for adults, but 8.6% of sites exceeded the safe threshold for children, and the carcinogenic risk surpassed 10−6 for all groups, with 2.6–9.6% of sites exceeding 10−4. Spatially, the strongest multimetal contamination corridors are the Baiyin–Lanzhou corridor (upper–middle reaches) for Cu-Pb-Zn (mining–smelting) and the Xi’an–Weinan belt (middle reaches) for Cd-Pb (agricultural–industrial and emissions). Multivariate clustering was more extensive (56.1% of sites) than single-metal clustering (13.1–26.2%), confirming coherent source-linked zones. Ecological risks were driven by Cd and Pb, whereas human health risks were driven by As, Cr, and Ni. This divergence and the strong spatial organization of the risk clusters highlight the need for source-specific, spatially targeted mitigation, which requires monitoring across all land use types. The compiled dataset, although extensive, is constrained by heterogeneity in sampling periods and analytical methods and by sparse coverage in some grassland, desert, and plateau regions.