Toxicokinetic-Informed Evidential Learning for Applicability-Domain-Aware QSAR/QSPR Prediction of Environmental Contaminant Toxicity
Xiankun Huang, Junkai Zheng, Zhihong Zheng, Wenhao XuQuantitative structure–activity relationship and quantitative structure–property relationship (QSAR/QSPR)-based molecular toxicity prediction provides an in silico strategy for prioritizing environmental contaminants when longer-duration bioassay data are sparse. However, many Simplified Molecular-Input Line-Entry System (SMILES)-based machine learning models treat exposure duration as an unconstrained numerical covariate and provide limited information on whether predictions are supported by the observed temporal domain. Here, we evaluated an applicability-domain-aware chemoinformatics framework that combines transformer-derived molecular representations with toxicokinetic-informed temporal encoding and evidential uncertainty estimation. The approach replaces conventional log10-transformed time encoding with a bounded first-order toxicokinetic saturation feature and combines this representation with Deep Evidential Regression to support a joint chemical–temporal view of the QSAR/QSPR applicability domain. Using experimentally derived U.S. EPA Ecotoxicology Knowledgebase (ECOTOX) fish EC50 mortality records, models were trained on 48,728 acute-duration observations and evaluated retrospectively on 2090 temporally separated longer-duration observations. The combined toxicokinetic and evidential model reduced temporal extrapolation error relative to conventional time encoding while maintaining comparable within-domain validation performance. The learned population-level timescale converged to 221 ± 3 h, consistent with accumulation timescales extending beyond standard acute fish test durations. Epistemic uncertainty was positively associated with absolute prediction error across all 10 folds, suggesting that the uncertainty estimates retained sample-level information relevant to applicability-domain-aware molecular toxicity screening. Cross-species analyses further showed that model behavior depended on training time coverage, with greater convergence when available assays covered a larger fraction of the learned timescale. These results suggest that toxicokinetic-informed temporal encoding can improve uncertainty-aware QSAR/QSPR modeling of environmental contaminant toxicity and support prioritization of compounds for further testing, while complementing rather than replacing chronic bioassays.