DOI: 10.1002/alz.072779 ISSN: 1552-5260

Bioinformatics pipeline to guide post‐GWAS studies in Alzheimer’s: a new catalogue of disease candidate short structural variants

Michael W Lutz, Ornit Chiba‐Falek
  • Psychiatry and Mental health
  • Cellular and Molecular Neuroscience
  • Geriatrics and Gerontology
  • Neurology (clinical)
  • Developmental Neuroscience
  • Health Policy
  • Epidemiology



Short structural variants (SSVs) including indels are common in the human genome and impact disease risk. The role of SSVs in late onset Alzheimer’s disease (LOAD) has been understudied. Previously we developed a bioinformatics pipeline that characterizes and prioritizes candidate regulatory SNPs in enhancers located in LOAD‐GWAS regions. Here we developed a bioinformatics pipeline of SSVs within LOAD‐GWAS regions to prioritize regulatory SSVs based on the strength of their predicted effect on transcription factor (TFs) binding sites. The impact of the genes and TFs prioritized by the pipeline are assessed in two LOAD single cell rna‐seq datasets.


The pipeline utilized publicly available functional genomics data sources. Primarily, candidate cis‐regulatory elements (cCREs) from ENCODE and publicly available single‐cell RNA‐seq data from LOAD patient samples from human prefrontal cortex brain samples obtained from UCI MIND’s Alzheimer’s Disease Research Center (ADRC) and ROSMAP. For TF binding analysis we employed motifs from MotifDb. In addition, we used various bioinformatics software including motifbreakR.


We catalogued 1230 proximal CTCF‐bound candidate cis‐regulatory elements in LOAD‐GWAS regions, 912 showed epigenetic evidence in relevant brain tissue. We catalogued 426 indels in these cCREs. These indels disrupted 391 TFs, 362 of these had snRNA‐seq data from LOAD samples. Of note, TF motifs within the APOE‐TOMM40, SPI1 and MS4A2 regions were significantly disrupted by the candidate regulatory indels. Amongst these TFs are RUNX3, SPI1 and SMAD3. Interestingly, these significant findings with the APOE‐TOMM40, SPI1 and MS4A2 regions are consistent with our prior results for SNPs.


This study provides an analytical framework to catalogue noncoding indel variation in cis‐regulatory elements located in LOAD‐GWAS loci and characterize their likelihood to perturb TF binding. The approach integrates multiple data types to prioritize genes and variants for validation experiments using disease models and gene editing technologies.

More from our Archive