DOI: 10.1002/alz.079369 ISSN: 1552-5260

Deep learning‐based SWAT‐TAB approach for Identifying Genetic Variants using Whole Genome Sequencing

Taeho Jo, Kwangsik Nho, Andrew J. Saykin
  • Psychiatry and Mental health
  • Cellular and Molecular Neuroscience
  • Geriatrics and Gerontology
  • Neurology (clinical)
  • Developmental Neuroscience
  • Health Policy
  • Epidemiology

Abstract

Background

Deep learning has been employed in various genetic studies, but use with large‐scale datasets, such as whole genome sequence (WGS) data, can be challenging due to the high‐dimension low‐sample‐size (HDLSS) problem. In such studies, the number of features (genetic variants) may be orders of magnitude larger than the number of samples, which can impact the applicability of deep learning.

Methods

In this study, we aimed to identify genetic variants associated with Alzheimer’s disease (AD) using a modified version of the SWAT‐CNN1 approach called SWAT‐TAB that addresses the HDLSS problem. The main modification to the original SWAT‐CNN approach was the incorporation of the Tabnet2 algorithm in the second step (Fig. 1A). This algorithm determines associations between features and selects the most relevant ones for a given label using a concept called sequential attention. This calculates the importance of a feature at each step and create a trainable mask based on this information. We applied this modified approach to ADSP WGS data for Chromosome 19, using 7,416 samples (4,266 AD and 3,150 cognitively normal) and 204,486 SNP features.

Results

rs429358, rs7256200, rs1969899716, and rs10414043 were the most important features on Chromosome 19 reflecting known AD genes such as APOE and APOC1, as well as several novel genetic features (Fig. 1B). The SWAT‐TAB approach demonstrated improved execution speed, with a reduction in processing time per SNP by 11.58%. The SWAT‐TAB also demonstrated improved reproducibility and ease of implementation as it only required one layer for execution, compared to the original approach which required as many layers as the number of features in the window.

Conclusion

Our deep learning‐based SWAT‐TAB approach showed significant improvement in execution speed, reproducibility, and ease of implementation compared to the original approach. This method appears very promising for identifying genetic variants associated with AD and constructing accurate classification models. Ongoing studies will determine the potential utility of this approach in larger analyses of WGS data beyond the initial results obtained for Chromosome 19.

More from our Archive