DOI: 10.1073/pnas.2535076123 ISSN: 0027-8424

The emergence of novel versus known three-dimensional structures from random sequences

Rose Yang, Hyunjun Yang, Anton Davydenko, Zack Mawaldi, Rian Kormos, Dru Myerscough, Yibing Wu, William F. DeGrado

It has been hypothesized that while random sequences are unlikely to fold into proteins of the length of globular proteins, repeated random sequences are more likely to adopt stably folded structures, with implications for molecular evolution. We used structure prediction methods to determine the foldability of approximately 120-residue sequences composed of 5- to 60-residue random repeats. With repeats of less than 30-residues, sequences were frequently discovered (1 to 12%) that fold with high confidence. For less than 60-residue repeats, we frequently observe β-solenoids, similar to those seen in natural proteins. We observe solenoids stabilized by apolar packing as well as ones stabilized by polar interactions with Ca 2+ in the core of the structure as in natural Repeats in ToXin (RTX) domains. Helical bundles were observed with high frequency when insertions or deletions were included between blocks of repeating sequences. We also observed a new supersecondary structure consisting of a tightly wound α-helical screw and experimentally confirmed its stability and structure by circular dichroism (CD) spectroscopy and X-ray crystallography. Thus, structure predictors can discover structures that are well out of the distribution of the data upon which they were trained. Beyond 40-residue repeat lengths, very few sequences were predicted to fold. The small number of structures we observed was representative of well-established major classes of tertiary structures; greater sampling would be needed to discover novel structures from a random distribution. These studies illuminate dark matter regions of protein structure space and support previous predictions that proteins evolved through the assortment of shorter peptide sequences.

More from our Archive