DOI: 10.1126/sciadv.aee1088 ISSN: 2375-2548

Image feature embedding with a deep learning framework improves genome-wide association studies on dog endophenotypes

Guang-Xiao E, Guo-Dong Wang

Domestic dogs exhibit substantial morphological diversity, making quantitative characterization of their phenotypes challenging. Traditional phenotyping methods often rely on manual measurements, which are limited in their ability to capture complex visual traits. Deep learning provides an opportunity to automatically extract informative and biologically meaningful features from images. In this study, we constructed a dataset of 13,254 dog images across multiple breeds and used ResNet and ViT models to automatically extract 256-dimensional image embeddings. After dimensionality reduction using UMAP (uniform manifold approximation and projection), we performed a GWAS (genome-wide association study) on the extracted features and breed-level genotype data. We identified 15 genes previously reported to be associated with dog traits such as hair length and body size, as well as previously unknown candidate genes related to body development and hair growth, including EIF2S2 , TRHR , and TCF25 , which harbor variants with potential functional relevance. This approach is validated by known genetic associations and can reveal previously unidentified genotype-phenotype links. Building on these capabilities, this approach provides a scalable framework for phenotype extraction that enables population genetic studies in domestic dogs and can facilitate breeding in other economically important species.

More from our Archive