Leakage-Guarded Next-Window Superchat Prediction from VTuber Live Chat Dynamics
Hwan Soo Yu, Jae-Uk Kim, Soo Young ChoPredicting near-future monetization in virtual livestreaming remains methodologically challenging because paid-support events are sparse, temporally dependent, and vulnerable to leakage under inappropriate evaluation designs. This study develops a leakage-guarded, window-based machine-learning framework for predicting next-window Superchat occurrence from VTuber live-chat dynamics. Public VTuber live-chat and Superchat logs were reconstructed into non-overlapping five-minute windows, and features were organized into audience activity, member composition, message intensity, donation-state information, and short-horizon dynamic groups. To reduce optimistic bias, the primary evaluation used video-level grouped splitting and compared a strict setting that excluded direct current-window donation-state variables with an extended donation-state-aware setting. HistGradientBoosting achieved the strongest performance. In the strict setting, it reached PR-AUC = 0.899, ROC-AUC = 0.920, F1 = 0.822, and Brier score = 0.171, while the extended setting produced only modest additional gains. Additional zero-chat sensitivity, repeated grouped split, channel-level robustness, graph-proxy baseline, feature-ablation, and calibration analyses supported the stability and interpretability of the framework. The results suggest that next-window Superchat occurrence can be predicted from participation breadth, chat activity, message intensity, and temporally shifted behavioral dynamics under leakage-aware evaluation.