DOI: 10.1177/15578666261457452 ISSN: 1066-5277
Investigations on Multiple Protein Scaffold Filling
Ismoiljon Muzaffarov, Xiaowen Liu, Letu Qingge, Lusheng Wang, Binhai Zhu
In this article, we initiate the study on some problems related to multiple protein scaffold filling, with or without references. The objective is to maximize the sum of Blosum62 scores of the filled sequences when no reference is given, or to maximize the Blosum62 score between the filled sequence and a reference. We present the following results: (1) given
n
scaffolds generated from the top-down tandem mass spectra, finding
k
scaffolds whose corresponding contents can be used to fill into a target sequence (or a sequence whose Blosum62 score with a reference is maximized) takes
Ω
(
n
k
−
ε
)
time, unless the Strong Exponential Time Hypothesis (SETH) fails. (2) Given two or more protein scaffolds and the corresponding multisets of amino acids to be filled accordingly, the corresponding optimization problem can be solved in polynomial time with dynamic programming. (3) Due to the high (and impractical) running times of these algorithms, we implement several heuristic algorithms for the special cases when three scaffolds are given. The corresponding empirical results are quite promising, from the testing of 12 datasets spanning five different kinds of proteins: antibody, myoglobin, mitochondrial respiratory-chain, calmodulin, and thioredoxin.