DOI: 10.26508/lsa.202403039 ISSN: 2575-1077

Efficient identification of de novo mutations in family trios: a consensus-based informatic approach

Mariya Shadrina, Özem Kalay, Sinem Demirkaya-Budak, Charles A LeDuc, Wendy K Chung, Deniz Turgut, Gungor Budak, Elif Arslan, Vladimir Semenyuk, Brandi Davis-Dusenbery, Christine E Seidman, H Joseph Yost, Amit Jain, Bruce D Gelb

Accurate identification of de novo variants (DNVs) remains challenging despite advances in sequencing technologies, often requiring ad hoc filters and manual inspection. Here, we explored a purely informatic, consensus-based approach for identifying DNVs in proband–parent trios using short-read genome sequencing data. We evaluated variant calls generated by three sequence analysis pipelines—GATK HaplotypeCaller, DeepTrio, and Velsera GRAF—and examined the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs. Comparison with a highly accurate DNV set, validated previously by manual inspection and Sanger sequencing, demonstrated that consensus filtering, followed by a force-calling procedure, effectively removed false-positive calls, achieving 98.0–99.4% precision. At the same time, sensitivity of the workflow based on the previously established DNVs reached 99.4%. Validation in the HG002-3-4 Genome-in-a-Bottle trio confirmed its robustness, with precision reaching 99.2% and sensitivity up to 96.6%. We believe that this consensus approach can be widely implemented as an automated bioinformatics workflow suitable for large-scale analyses without the need for manual intervention, especially when very high precision is valued over sensitivity.

More from our Archive