Validation of the International Weed Genomics Consortium genome annotation pipeline through reannotation of the model species Arabidopsis thaliana
Luan Cutti, Daniel Fernando da Silva Filho, Geisson Edwin Guadir Lara, Jessica Matheson, Nicholas A. Johnson, Jacob Montgomery, Nathan Hall, Brent Murphy, Todd A. Gaines, Eric L. PattersonAbstract
The International Weed Genomics Consortium (IWGC) has sequenced and annotated the genomes of over 30 weed species, generating genomic resources to understand their biology, evolution, and adaptation. The objective of this study was to evaluate the semi‐automated, isoform sequencing (Iso‐seq)‐based, IWGC genome annotation pipeline by reannotating the genome of the model species Arabidopsis thaliana with various amounts and types of extrinsic data and to measure the impact that varying inputs had on the annotation completeness and quality. Annotations were run comparing the effects of (1) the quantity and source of Iso‐seq reads, (2) annotated proteins from botanically closely related or distantly related species, and (3) the number of proteins provided to the annotation program “ MAKER‐P .” Reannotations were compared to each other and to the published annotation of the A. thaliana genome. The IWGC annotation pipeline annotated almost all the genes without manual curation when informed with an Iso‐seq dataset and proteins of related species. In general, the pipeline produced more accurate, annotated genes with more input proteins, especially from closely related species, in the gene model prediction step. Furthermore, the combination of proteins from several closely related species increased the number of annotated genes. The number or source of Iso‐seq reads did not have a significant effect if many proteins from closely related species were utilized. The annotation pipeline annotated nearly 90% of genes from additional crop species genomes. The IWGC genome annotation pipeline is robust in reannotating the A. thaliana genome and therefore is most likely performing well in the several non‐model weed species it has been used on so far.