P30 Using large language models for the annotation of skin single-cell RNA sequencing datasets

doi:10.1093/bjd/ljag151.069

DOI: 10.1093/bjd/ljag151.069 ISSN: 0007-0963

P30 Using large language models for the annotation of skin single-cell RNA sequencing datasets

Tanzil Rujeedawa, Joseph Inns, Richard Gallon, Neil Rajan

Show PDF Cite

Abstract

Introduction and aims

Large language models (LLMs) have recently been shown to accurately annotate cell types in single-cell RNA sequencing (scRNAseq) datasets, using differentially expressed genes (DEGs). We evaluated the performance of LLMs in annotating cell clusters and subclusters from a skin scRNAseq dataset and compared the performance with conventional annotation tools.

Methods

An scRNAseq dataset comprising 750 498 cells was generated from 24 skin samples from patients with germline FLCN pathogenic variants. The top 100 DEGs from each cluster were provided to four LLMs (GPT-o3, GPT4.5, Claude Sonnet 4 and Gemini 2.5 Pro), with or without uniform manifold approximation and projection (UMAP) diagrams. Conventional annotation was performed using the literature with annotation of cell types and CellMarker.

Results

Across 30 clusters from this dataset, concordance with conventional annotation was on average 64% using DEGs alone and 86% using both DEGs and UMAPs. Using DEGs alone, concordance was 90% with GPT-o3, 77% with GPT 4.5, 77% with Claude Sonnet 4 and 13% with Gemini 2.5 Pro. Using both DEGs and UMAP, concordance was 93% with GPT-o3, 80% with GPT 4.5, 90% with Claude Sonnet 4 and 80% with Gemini 2.5 Pro. For fibroblast subclusters (n = 16), GPT-o3 matched 31% of conventional annotations, increasing to 63% when fibroblast references were supplied. Myeloid (n = 18) and keratinocyte (n = 20) subclusters showed 44% and 45% concordance, respectively, which improved to 56% and 70% with relevant references.

Conclusions

LLMs can achieve high concordance with conventional scRNAseq annotation tools at the cluster level and moderate concordance at the subcluster level. An advantage of LLM annotation is that it does not require pretrained models for tissue type annotations and hence, is useful for less well studied tissues. While currently not a substitute for expert annotation, LLMs may serve as valuable adjuncts, and warrant ongoing evaluation.

Outline

P30 Using large language models for the annotation of skin single-cell RNA sequencing datasets

Abstract

Introduction and aims

Methods

Results

Conclusions

More from our Archive