DOI: 10.1029/2025gl121380 ISSN: 0094-8276

Toward Using Equation Discovery to Generate Parameterizations of Biogeochemical Processes

Chengwang Wang, B. B. Cael, Alessandro Tagliabue

Abstract

Equation discovery methods, such as symbolic regression, show great promise to generate parameterizations of biogeochemical processes in an objective data‐driven manner, yet remain untested in ocean biogeochemistry. Here, we apply symbolic regression to a state‐of‐the‐art ocean biogeochemical model, using it as a surrogate data set to rediscover an empirical equation used to calculate colloidal iron in the model. We introduce a robustness metric combining R 2 (global pattern reproduction) and EMD‐SHAP (similarity of functional behaviors) for discovered equations. While symbolic regression did not rediscover the original equation because of its empirical complexity, it generated simpler equations with similar performance and functional behaviors, indicating symbolic regression's potential as an emulator bridging between models. Subsampling experiments show that robust equations require full‐depth and multi‐basin sampling, underscoring sampling priorities on colloidal iron. This framework can be broadly applicable to other poorly constrained biogeochemical processes.

More from our Archive