DOI: 10.2337/db23-204-lb ISSN: 0012-1797

204-LB: Application of a Data Quality Framework to Diabetic Retinopathy in the All of Us Dataset

JOHN GIANNINI, YECHIAM OSTCHEGA, MICHAEL VOLYNSKI, LINA SULIEMAN, ELIF DEDE YILDIRIM, EVAN OCHSENFABER, LEW BERMAN, ANDREA RAMIREZ
  • Endocrinology, Diabetes and Metabolism
  • Internal Medicine

Diabetic retinopathy (DR) is the leading cause of preventable vision impairment and blindness among adults with diabetes. The All of Us Research Program aims to collect biomedical data (electronic health records (EHR), surveys, biospecimens, physiological measures, and digital health technology such as wearable devices) from one million or more participants, with the goal of improving targeted management of diseases like diabetes and DR. Because issues such as data entry and record harmonization may affect the usability of these types of data, this study examines the quality and validity of All of Us data to inform fitness for research use. This study's primary objective was to examine the quality of the EHR data as it relates to the diagnosis, treatment, and procedures related to DR. Adults were selected for inclusion if their EHR data contained one of five ICD10CM codes that are associated with DR from type 2 diabetes. Demographics, measurements, eye care procedures, and medications are analyzed using a framework of five data quality dimensions (DQDs): completeness, concordance, conformance, plausibility, and temporality. After the Spring 2023 update, the database contains over 413,000 consented participants sharing EHRs, with nearly 49,000 participants with a type 2 diagnosis (excluding type 1) and 3,600 participants with a related DR diagnosis (prevalence of 0.87%). The DR cohort was 50% female, compared to all participants at 60%. In participant EHRs, over 99% contained physical measurements (height, weight, blood pressure), 79% contained codes for blood glucose lowering drugs (excluding insulin), and 47% contained a relevant eye care procedure. 73% of participant EHRs contained a plausible diagnosis series (type 2 diabetes before DR), with a median elapsed time of 2.1 years. This exploration of EHR data using DQDs affirms the validity and trustworthiness of All of Us program data, which forms a valuable dataset for future DR research.

Disclosure

J. Giannini: None. Y. Ostchega: None. M. Volynski: None. L. Sulieman: None. E. Dede Yildirim: None. E. Ochsenfaber: None. L. Berman: None.

Funding

National Institutes of Health

More from our Archive