Real-time triage of satellite conjunction messages: An explainable machine learning framework
Luyi Yuan, Guoyi WangAs satellite density in low Earth orbit increases, conjunction risk has become a primary threat to space safety. However, traditional physics-based models incur computational overheads that limit real-time automated triage. This study pioneers an intelligent anomaly detection framework utilizing 3,170 independent historical conjunction messages from the Space-Track database, which were rigorously deduplicated from raw alert streams to ensure a cross-sectional analysis free from temporal leakage. Treating collision risk assessment as a high-stakes imbalanced classification problem, it is crucial to note that this model is designed to approximate and predict the high-fidelity physics-based risk assessments (Probability of Collision and Minimum Range), rather than actual physical collisions. This study identified ten robust predictors through a multi-stage feature selection process (univariate, LASSO, and Boruta) and addressed class imbalance using the synthetic minority over-sampling technique. To rigorously validate long-term stability and mitigate single-split bias, a 5 × 10-fold repeated stratified cross-validation was conducted. Furthermore, comparative calibration analysis confirmed that the applied over-sampling mitigated class imbalance without distorting the probabilistic output. Among seven machine learning algorithms evaluated, the Extreme Gradient Boosting (XGBoost) model exhibited superior performance, achieving an AUC of 0.998 and a Brier score of 0.007 in validation. Decision curve analysis further confirmed its engineering net benefit. Additionally, SHAP interpretability analysis identified the inclination of the first satellite and the orbital period of the second satellite as the most critical risk drivers. This robust, interpretable model provides a millisecond-level automated classification system, laying the technological foundation for future autonomous space traffic management.