A Two-Layer Structural Key Framework for Linking Compound Identifiers and MS/MS Evidence in Spectral Database Curation
Kaiwen Deng, Ran Liu, Ruiping He, Li ChenBackground: MS/MS spectral databases provide reference spectra for compound identification in metabolomics studies. Their utility depends on clear links among compound identifiers, chemical structures, and MS/MS evidence, yet these links are often complicated by database-specific identifiers, heterogeneous structural representations, and stereochemical specifications. Methods: Here, we present a two-layer structural key framework for linking compound identifiers and MS/MS evidence through standardized structures. Reported SMILES were standardized and converted into InChIKey-derived stereo keys and connectivity keys using a Python-based RDKit workflow. Results: As illustrated using stereoisomeric cases such as L- and D-proline, the stereo key layer preserves compound identifiers and metadata at the stereo level, whereas the connectivity key layer groups comparable MS/MS evidence at the molecular connectivity level. In a database-scale application, 217,920 HMDB compound entries were organized into 216,783 stereo keys and 196,512 connectivity keys, and 144,591 spectra from the spectrum-centered MoNA database were incorporated into the HMDB-centered framework, increasing MS/MS evidence coverage, particularly at the molecular connectivity level. Conclusions: Together, this framework links compound identifiers, standardized structures, and MS/MS evidence at the stereo and connectivity levels, providing a bidirectionally traceable system for spectral database curation without forcing connectivity-level MS/MS evidence into stereo-specific compound identities.