Digital Morphometric Phenotyping of Genipa americana Seeds: Post-Germinative Root-Development Classification and Seed-Lot Morphometric Differentiation
Akinmola Solomon Morebise, Hans Richie Tchouckoua Nana, Samuel Ribeiro de Azevedo, Claudemir Mota da Cruz, Rafael Marani BarbosaThis study evaluated whether image-derived morphometric descriptors can support two complementary tasks: germination-performance classification and seed-lot-level phenotypic differentiation in Genipa americana L. Seed images were acquired from eight collection localities in southern Bahia State, Brazil, comprising 16 seed lots and approximately 16,000 initially imaged seeds. Primary and derived morphometric descriptors were extracted from digital images and used as predictors in Random Forest (RF) and Linear Support Vector Machine (SVM) classifiers. After data curation, the final analytical dataset comprised 15,513 seeds. RF achieved a higher cross-validated F1-score than SVM (0.780±0.007 versus 0.621±0.053), but independent test-set performance was moderate and comparable between models. RF and Linear SVM achieved accuracies of 0.74 and 0.75, respectively, with PR-AUC values of 0.649 and 0.671. PCA indicated that the first two components explained 98.9% of total morphometric variance, and MANOVA confirmed a significant seed-lot effect (Wilks’ λ=0.84, p<0.001). Critically, under Leave-One-Locality-Out cross-validation, mean macro-F1 fell to (0.46±0.04) and balanced accuracy to (0.50±0.01), indistinguishable from chance, indicating that random-split performance largely reflects locality-specific patterns rather than a transferable germination signal. These findings indicate that external morphometric descriptors provide a moderate within-distribution predictive signal under random seed-level partitioning, but this signal is not transferable to unseen localities. Digital morphometric phenotyping should therefore be regarded primarily as a low-cost component of native seed–lot characterization and locally calibrated preliminary screening, especially when combined with complementary physiological, internal-imaging, environmental, or genetic information.