Machine Learning Identification of Proteomic Signatures in MSC-Derived EVs Using PRIDE Data
Yermekbayeva Kalzhan
A
BSTRACT
Background:
Mesenchymal stem cell-derived extracellular vesicles (MSC-EVs) are promising cell-free therapeutic candidates in regenerative medicine because their activity is mediated by bioactive cargo, including proteins. Compared with transcriptomics alone, proteomics provides a closer representation of functional molecular effectors. This study aimed to identify candidate proteomic signatures of MSC-EVs using public PRIDE data and an exploratory machine learning (ML) framework.
Methods:
Label-free quantitative proteomics data were obtained from the PRIDE dataset PXD020948, which contains extracellular vesicles (EVs) derived from adipose tissue, bone marrow, and umbilical cord mesenchymal stem cells. Protein intensity data were extracted from the proteinGroups file, cleaned by removing contaminants and reverse identifications, and processed using missing-value handling, log
2
transformation, and
Results:
After preprocessing, 1014 proteins across 9 samples were retained for analysis. Differential expression analysis identified 101 proteins meeting the predefined exploratory thresholds between adipose- and bone marrow-derived EVs (adjusted
Conclusions:
This study provides a proteomics-based, ML-supported workflow for identifying candidate MSC-EV protein signatures from public PRIDE data. The identified proteins are hypothesis-generating candidates rather than validated clinical biomarkers. The optional GEO/multiomics step supported COL6A1 as a limited cross-layer convergence signal, but this result should not be interpreted as confirmatory evidence because only one overlapping marker was detected, and no joint cross-platform model was performed.