DOI: 10.1161/circ.148.suppl_1.15738 ISSN: 0009-7322

Abstract 15738: Synthetic Data Generation for the Get With the Guidelines® Registry

Lanjing Wang, Holly Picotte, Chandler D Beon, Jennifer L Hall, Juan Zhao, Xue Feng
  • Physiology (medical)
  • Cardiology and Cardiovascular Medicine

Introduction: The American Heart Association (AHA) Get With The Guidelines® (GWTG) registry is a vital resource for cardiovascular/stroke research. Encompassing over 13 million records, the GWTG registry offers valuable insights into cardiovascular/stroke care and outcomes across more than 2,600 participating hospitals. However, the governance policies of the GWTG registry data limit use to academic researchers with approved manuscripts and funded awards.

Goal: The goal of this project was to generate a synthetic dataset for the AHA GWTG registry data, aiming to increase accessibility while preserving data privacy. It provides researchers with the ability to explore the data format, variables and learn how to analyze the data on the AHA’s Precision Medicine Platform, which is a cloud-computing environment designed for researchers.

Methods: Data was utilized from the GWTG Stroke registry from 2005 to 2021. We randomly sampled 1000 records from the registry and implemented a privacy-preserving data transformation process, which involved shifting datetime variables and replacing identifiable sensitive field values with randomly generated eight digit-numbers. We evaluated the synthetic data generation process by comparing distributions of patient demographics (age, race, gender) and time metric measures.

Results: The original data from the GWTG Stroke registry includes 7.8 million individuals and 1233 columns with mean (SD) age of 69.56 (14.78) years (White 73.64%, Black 16.47%, Male 48.84%, and Female 51.1%). The synthetic data exhibits similar distributions compared to the original data (synthetic data: mean [SD] age, 69.61 [14.66] years, White 75.3% and Black 14.9%, Male 48.4%, and Female 51.6%). Moreover, all time differences are maintained, such as door-to-needle time, time to Intravenous Thrombolytic Therapy.

Conclusion: This project demonstrates successful development and evaluation of synthetic data generation methods for the AHA GWTG registry. The synthetic data is accessible to researchers on the AHA’s Precision Medicine Platform, enabling broader access and analysis while preserving privacy to advance cardiovascular and stroke care research.

More from our Archive