DOI: 10.1158/1538-7755.disp23-c071 ISSN: 1538-7755

Abstract C071: From promise to progress: Utilizing the All of Us dataset to advance cancer disparities research

Karriem S. Watson, Martin Mendoza, Grant Jones, Sydney McMaster, Anna Mitiku
  • Oncology
  • Epidemiology


As the world’s largest and most diverse dataset of its kind, the All of Us Research Program within the National Institutes of Health is a national effort that is building the largest, most diverse health information database that researchers can use to study how different genetic, lifestyle, and environmental factors impact health and disease. All of Us is built in partnership with participants spanning different ages, races, ethnicities, and regions of the country, including 45% who self-identify as racial and ethnic minorities and 75% from communities underrepresented in biomedical research (UBR). In addition to contributing to health research, participants are active partners of the program through participation in the program’s governance and ongoing engagement within the Engagement Ecosystem ‒ a network of national partners with deep connections to communities. These partners work collaboratively to reach specific populations by raising awareness about the program among racial/ethnic minority populations and historically underrepresented communities, such as LGBGTQIA+, disability communities, and rural and older adults, in an effort to complement and enhance existing outreach and engagement efforts.As of April 2023, the program has data from 413,450+ participants, containing the largest set of whole genome sequences (WGS) widely available for research with over 312,900 genotyping arrays, 245,350 WGS, and 1,000 long-read sequences available to registered researchers. To ensure value is returned, participants may consent to receiving a personalized DNA report with information on genetic ancestry, traits, pharmacogenetics, and a hereditary disease risk report on the ACMG 59 “medically actionable” genes, with 20 genes related to some type of cancer. The most common conditions available include cancer, cardiovascular disease, hypertension, mental health, and diabetes. There is also electronic health record (EHR) data from more than 287,000 participants inclusive of demographics, health care visits, diagnoses, and medications data, with over 50,700 diagnosed with at least one type of cancer. As many cancer disparities are driven by the interaction of biological and social factors, All of Us provides an opportunity for researchers to explore data on lifestyle factors, social determinants of health, 3-digit zip codes, neighborhood, social life and exposures, including potential indications of allostatic load and inflammatory markers associated with stress and discrimination, as well as factors related to tobacco and alcohol use. Using a broad search for cancer diagnoses in participant EHR records, there are over 200 different kinds of cancers with 73% represented in the UBR race/ethnicity category. Session attendees will gain a better understanding of the breadth and depth of data available to study within the Researcher Workbench, learn how this data can support cancer intervention research, and recognize the utility of accessing the world’s largest, most diverse dataset and its role in addressing health disparities.

Citation Format: Karriem S. Watson, Martin Mendoza, Grant Jones, Sydney McMaster, Anna Mitiku. From promise to progress: Utilizing the All of Us dataset to advance cancer disparities research [abstract]. In: Proceedings of the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2023 Sep 29-Oct 2;Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2023;32(12 Suppl):Abstract nr C071.

More from our Archive