π-HelixNovo2: Making Accurate Online De Novo Peptide Sequencing Available to All
Tingpeng Yang, Tianze Ling, Boyan Sun, Zhendong Liang, Cheng Lai, Jiangli Hu, Zexuan Yi, Yonghong He, Leyuan Li, Yue Yu, Cheng Chang, Yu WangAbstract
De novo peptide sequencing, the mainstream technique for identifying novel peptides, has recently seen remarkable improvements due to deep learning approaches. However, existing models struggle to effectively enhance the encoding of mass spectra and the decoding of amino acids, which limits their overall performance. Moreover, these models lack peptide filtering for de novo peptides, and often present challenges for users without programming expertise. Here, we propose π-HelixNovo2, a de novo peptide sequencing model that integrates complementary spectrum and bidirectional decoding within a Transformer framework. We further propose a peptide filtering strategy to identify the correct peptide-spectrum matches from the results of π-HelixNovo2. Our experiments demonstrate that π-HelixNovo2 outperforms state-of-the-art models, offering reliable performance in identifying antibody peptides, multi-enzyme cleavage peptides, non-enzymatic peptides, and analyzing the gut metaproteome. Finally, we trained π-HelixNovo2 on the large-scale MassIVE-KB dataset, and present an open, user-friendly, and online computational platform to make π-HelixNovo2 freely available to all (https://openi.pcl.ac.cn/OpenI/pi-HelixNovo-NPU).