Abstract C004: Assessing local ancestry inference using different sequencing assays and depth of coverage
Tomoki Motegi, Joshua D. Campbell- Oncology
- Epidemiology
Abstract
Introduction: Several studies have reported differences in mutational frequencies between different ancestral populations. For example, PTEN alterations and TMPRSS2-ERG fusions are less common in men with African ancestry. However, these studies have primarily relied on measurements of "global ancestry" and have not been able to ascertain which genomic regions contribute to the observed differences in mutational frequencies. Recent advancements in computational algorithms now enable the estimation of “local ancestry” whereby individual chromosomal segments can be assigned to a particular ancestral background. These tools have been primarily developed for deep whole-genome sequencing (WGS) and have not been optimized for different sequencing assays including ultra-low pass WGS (ULP-WGS), whole exome sequencing (WES), and SNP genotyping arrays. The goal of our study is to establish best practices for local ancestry inference across a diverse set of sequencing and genotyping assays. Methods: We procured WGS data from four ancestries through the 1000 Genomes Project (n=1,067) for local ancestry inference. ULP-WGS and WES (with on- and off-target coverage) were simulated from the full WGS data. Pseudo-array data was emulated by restricting to SNP sites on Affymetrix Genome-Wide Human SNP array 6.0. Each genotype was imputed by Eagle-Beagle (EB) or GLIMPSE2 (GL2), and local ancestry inference was performed using G-nomix using window sizes spanning from 0.2 to 16.0 centimorgans (cM). Accuracy was assessed by comparing local ancestry calls for each assay in a training set (n=1007) to a validation set (n=62). Results: Local ancestry estimation in ULP-WGS data achieved an accuracy exceeding 98% when utilizing GL2, even with the smallest window size of 0.2 cM and a minimum sequencing depth of 0.1x. To achieve similar accuracy of 98% in ULP-WGS with EB, a window size of 1.5 cM was needed for sequencing depths from 2 to 0.5x, with a larger window size of 4 cM required for a depth of 0.25x. Interestingly, such accuracy did not obtain at a depth of 0.1x. For WES with on-target sites and pseudo-array data, we utilized EB for imputation as it was more computationally efficient and accurate with larger window sizes compared to GL2. The accuracy of WES with on-target sites exceeded 90% utilizing 4.0 cM or larger window sizes, but no window size achieved 98% accuracy. Incorporating off-target regions in WES and employing GL2 enhanced the accuracy to 98% at a window size of 1.5 cM. Pseudo-array data with EB achieved an accuracy of 98% at 0.5 cM and thus was better than WES. Conclusions: Our results show that local ancestry can be accurately estimated in different sequencing and genotyping assays when the window size and imputation tool are appropriately selected. This work will facilitate the use of local ancestry inference in studies utilizing different genotyping assays. Ultimately, researchers understand the influence of local ancestral haplotypes on molecular features of cancer.
Citation Format: Tomoki Motegi, Joshua D. Campbell. Assessing local ancestry inference using different sequencing assays and depth of coverage [abstract]. In: Proceedings of the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2023 Sep 29-Oct 2;Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2023;32(12 Suppl):Abstract nr C004.