Graph-Conditioned Stochastic Modeling of Twitter Information Cascades with Dual-Head Transformers for Early Virality Prediction
Bowen Dong, Xinyu Zhang, Chaoya Yan, Weiyan Zhu, Lingmin Hou, Yifan FengInformation cascades in online social networks arise from stochastic interactions among user behavior, temporal activation, and graph-structured exposure. Early prediction of cascade outcomes remains difficult because only a short diffusion prefix is observable, while future propagation depends on sparse user-level transitions across a heterogeneous social network. This study develops a graph-conditioned stochastic modeling framework for early Twitter cascade prediction. Retweet cascades are formulated as history-dependent stochastic processes over a finite user vocabulary, and a causal dual-head Transformer is used to infer cascade virality and logarithmic final size from short observed prefixes. To incorporate social-network structure, user embeddings pretrained from the follow graph are introduced as external structural priors. A controlled ablation design separates the effects of random embeddings, graph-pretrained embeddings, frozen structural priors, and handcrafted feature fusion. Experiments on Higgs Twitter retweet cascades show that direct full-vocabulary next-user prediction is statistically fragile under sparse short-prefix observations, motivating macro-level cascade outcome prediction. Among the evaluated configurations, the frozen graph-pretrained Transformer achieves the strongest overall balance, reaching an AUC of 0.819, a Brier score of 0.151, and an RMSE of 0.192, while the causal Transformer without a graph prior already surpasses logistic regression and approaches Random Forest; however, gains over competitive baselines are modest and statistically significant only in selected pairwise comparisons. Calibration analysis, bootstrap confidence intervals, and paired statistical tests confirm that graph-derived user priors provide more reliable improvements than sequence modeling alone under short-prefix sparse observations. These findings indicate that graph-conditioned structural priors offer a promising complement to causal sequence modeling for early Twitter cascade prediction.