DOI: 10.1093/bioinformatics/btag354 ISSN: 1367-4811

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k -mer profiles

Sivasubramani Selvanayagam, Jesus Quiroz-Chavez, Ricardo H Ramirez-Gonzalez, Cristobal Uauy, Sandra Smit, M Eric Schranz

Abstract

Motivation

In the era of multiple genome references, researchers often align sequencing reads against distinct assemblies or even multiple references simultaneously. This enables applications such as the detection of introgressed segments or highly variable genomic regions, which are especially prevalent in large-genome crop species such as lettuce or wheat. However, these applications come at the cost of increased computational burden, inconsistencies in mapping methods, and reduced reproducibility across studies. To address these limitations, we developed KCFtools, a Java-based toolkit that identifies the presence and absence of k-mers in non-overlapping genomic or transcriptomic windows by comparing query and reference genomes. This alignment-free approach enables the efficient computation of an identity score for each window, thereby facilitating robust detection of introgressed or variable regions across genomes.

Results

We systematically evaluated the performance and accuracy of the k-mer-based method implemented in KCFtools, benchmarking it against conventional Single Nucleotide Variation (SNV)-based introgression detection pipelines. Our results demonstrate that KCFtools effectively captures introgressed segments and structurally diverse regions, even in species with fragmented or highly divergent reference genomes. In addition, we extended KCFtools to generate genotype matrices from k-mer variation tables. These matrices are compatible with Genome-Wide Association Studies (GWAS) software and allow the identification of loci associated with phenotypic traits. We showcase the utility of this approach by detecting known and novel associations for downy mildew resistance in lettuce, underscoring the pipeline’s potential for high-resolution, reference-agnostic population genetic analysis.

Availability

https://github.com/sivasubramanics/kcftools

More from our Archive