Creating a pipeline to extract genetic variants associated with atrial fibrillation to use in large, multi-ethnic management registries
E Dave, E M Burke, C L Brown, M C BurkeAbstract
Introduction
Atrial fibrillation (AF) is prevalent. Genome-wide association studies (GWAS) have identified hundreds of single nucleotide polymorphisms (SNPs) associated with AF susceptibility. However, integrating GWAS-identified SNPs as a clinical risk factor has not been matched to interventions and outcomes. Studying genetics in large registries offers the potential to personalize treatment (e.g., ablation, anti-arrhythmic drugs, rhythm control), evaluate responses to interventions, and outcomes (e.g., death, stroke, recurrent AF).
Purpose
This study aimed to validate a pipeline for extracting AF-associated SNP genotypes from whole-genome sequencing (WGS) data using a known database to create a framework for future integration with a large AF registry.
Methods
The pipeline was tested using publicly available WGS data from the 1000 Genomes Project (n = 2,504). AF-associated SNP positions and related genes were retrieved from the 2025 Nature Genetics GWAS and verified against the hg19 reference genome using Ensembl BioMart. The SNP list included loci near genes commonly associated with AF pathogenesis, such as PITX2, ZFHX3, KCNN3, and CAV1. Individual WGS data in VCF files were processed in R using VariantAnnotation, GenomicRanges, and dplyr against the AF-associated SNP list to extract genotype calls in numeric ("0/1") and base pair formats ("AG", "TT"). A mock genotype matrix was constructed to validate the pipeline’s ability to detect risk alleles from any WGS dataset.
Results
The pipeline successfully generated a genotype matrix for more than 2,500 individuals that revealed the presence or absence of AF-associated SNPs. A representative subset sample is shown in Table 1. Table 2 demonstrates potential integration of the pipeline with interventions and outcomes from a large, multi-ethnic AF registry.
Conclusion
The SNP detection pipeline was validated using a publicly available cohort, establishing a reliable framework for future application to large AF registries. The power of personal GWAS matched to prospective long-term outcomes and clinical interventions may impact clinical decision making.