DOI: 10.1111/1755-0998.13859 ISSN:

Scaling‐up RADseq methods for large datasets of non‐invasive samples: Lessons for library construction and data preprocessing

Larissa S. Arantes, Jilda A. Caccavo, James K. Sullivan, Sarah Sparmann, Susan Mbedi, Oliver P. Höner, Camila J. Mazzoni
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Biotechnology


Genetic non‐invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non‐endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction‐site‐associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large‐scale gNIS‐based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non‐invasively and varying in DNA degradation and contamination level. Using small‐scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non‐invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent–offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non‐contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large‐scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re‐pooling strategy that considers the endogenous DNA content.

More from our Archive