DOI: 10.1161/str.55.suppl_1.wp266 ISSN: 0039-2499

Abstract WP266: Prediction of Stroke Incidence Using Machine Learning: The Suita Study

Thien Vu, Mai Inoue, Masaki Yamamoto, Attayeb Mohsen, Agustin Martin-Morales, Takao Inoue, Rsch Dawadi, Yoshihiro Kokubo, Michihiro Araki
  • Advanced and Specialized Nursing
  • Cardiology and Cardiovascular Medicine
  • Neurology (clinical)

Background: This population-based study investigated the potential of machine learning algorithms to predict stroke incidence and identify important risk factors. This study aimed to evaluate the accuracy of these algorithms in constructing a stroke prediction model.

Methods: Participants from the Suita study were included, and baseline measurements were used to predict stroke outcomes over a 15-year follow-up period. In total, 7,389 participants and 51 variables were investigated, including demographics, medical history, medical imaging, laboratory data, and lifestyle habits. Initially, unsupervised K-prototype clustering was used to group participants based on their stroke risk. Subsequently, five supervised models (logistic regression, random forest, support vector machine, extreme gradient boosting, and light gradient boosted machine) were applied to predict the stroke outcomes. The Shapley Additive Explanations (SHAP) method determined the most critical variables.

Results: Unsupervised clustering revealed significant differences in stroke incidence among the three identified risk clusters (9.1%, 6.6%, and 3.2%). These clusters were categorized into high-, medium-, and low-risk groups. Among the supervised models, the random forest algorithm demonstrated the best performance. The top ten most important variables for predicting stroke incidence were identified using the SHAP, with age being the most influential variable. Other significant risk markers included systolic blood pressure, hypertension, estimated glomerular filtration rate, metabolic syndrome, and blood sugar level. Additionally, elbow joint thickness and fructosamine, hemoglobin, and calcium levels were found to be potential predictors of stroke risk. Notably, the variables identified by the SHAP were consistent with those obtained from the unsupervised clustering approach in the high-risk group.

Conclusion: Machine learning algorithms provide accurate predictions of stroke incidence and offer valuable insights into subclinical markers without the need for prior assumptions of causality. This study presents a data-driven machine-learning framework for stroke risk prediction and biomarker identification.

More from our Archive