DOI: 10.3390/buildings16132489 ISSN: 2075-5309

Concrete Compressive Strength Prediction, External Benchmark Validation, and Scenario-Based Candidate Mixture Screening Using TabPFN and NSGA-II

Wei Chen, Yinggang Liu, Liukui Zhu, Yinbo Zhang, Weifei Zhao, Xiaofang Zhao, Baoyu Dong

Public concrete datasets often contain duplicate records, coupled variables, and cross-source distribution shifts, which may lead to overly optimistic model evaluation. Based on a deduplicated UCI high-performance concrete dataset (1005 samples), this study develops a leakage-controlled data-driven workflow with applicability-domain assessment. TabPFN, SHAP, and NSGA-II are used for compressive strength prediction, model-response attribution, and scenario-based candidate mix screening, respectively. Model evaluation follows a unified data split, inner training-set cross-validation, and an independent test-set protocol. In addition, 502 non-overlapping records from the Mendeley PCC dataset are used as an external benchmark to examine cross-source transferability and sensitivity to distribution shift. The results show that TabPFN achieves the highest R2 and the lowest RMSE, MAE, and MAPE on the internal UCI test set, with values of 0.953, 3.744 MPa, 2.265 MPa, and 7.580%, respectively; however, its advantage over strong baselines such as CatBoost is limited. On the external Mendeley PCC dataset, TabPFN remains competitive, with R2, RMSE, and MAE values of 0.490, 15.175 MPa, and 11.457 MPa, respectively, but its performance is close to that of random forest, XGBoost, and CatBoost. The 5NN applicability-domain stratification shows that external samples located within the 95% 5NN applicability domain achieve improved performance (R2 = 0.634 and RMSE = 12.367 MPa), suggesting that external prediction errors are associated with the distance from the source-domain distribution. SHAP results indicate that cement, ground granulated blast-furnace slag, curing age, and water are the main attribution variables in the model output; their response directions should be interpreted as statistical attributions rather than material causal mechanisms. The Pareto candidate mixes generated by NSGA-II satisfy basic engineering constraints. Nevertheless, because the external benchmark reveals sensitivity to cross-source distribution shift, the resulting mix proportions should be treated as pre-experimental screening candidates rather than engineering-validated low-GWP concrete mix proportions.

More from our Archive