编辑: 星野哀 | 2019-07-16 |
2 distinct feature of the present T-MP is that the prediction results can be analyzed and interpreted in physical terms to shed light on the molecular mechanism of protein folding energy changes upon mutation. Additionally, the mathematical model for different types of mutations can be adaptively optimized according to the performance analysis of ESPH features. We demonstrate that the performance of proposed T-MP matches or excesses that of other existing methods. II Methods II.A Persistent homology characterization of proteins Unlike physics based models which describe protein folding in terms of covalent bonds, hydrogen bonds, elec- trostatic and van der Waals interactions, the natural language of persistent homology is topological invariants, i.e., the intrinsic features of the underlying topological space. More speci?cally, independent components, rings and cavities are topological invariants in a given data set and their numbers are called Betti-0, Betti-1 and Betti- 2, respectively, as shown in the top row of Fig. 1. Loosely speaking, simplicial complexes are generated from discrete data points according to a speci?c rule such as Vietoris-Rips complex, Cˇ ech complex, or alpha com- plex. Speci?cally, a 0-simplex is a vertex, a 1-simplex is an edge, a 2-simplex is a triangle, and a 3-simplex represents a tetrahedron, see the middle row of Fig. 1. Algebraic groups built on these simplicial complexes are used in simplicial homology to practically compute Betti numbers of various dimensions. Furthermore, persistent homology creates a series of homologies through a ?ltration process, in which the connectivity of a given data set is systematically reset according to a scale parameter, such as an ever-increasing radius of every atom in a protein, see the bottom row of Fig. 1. As a result, the birth, death, and persistence of topological invariants over the ?ltration give rise to the barcode representation of a given data set.29 When persistent homology is used to analyze three dimensional (3D) protein structures, one-dimensional (1D) persistent homology barcodes are obtained as topological ?ngerprints (TFs).23C25,28 As an illustration, we consider the persistent homology analysis of a wild type protein (PDB:1ey0) and its mutant. The mutation (G88W) occurred at residue
88 from Gly to Typ is shown at Fig. 2a and b. In this case, a small residue (Gly) is replaced by a large one (Typ). We carry out persistent homology analysis of a set of heavy atoms within 6? from the mutation site. Persistent homology barcodes of the wild type and the mutant are respectively given in Fig.
2 c and d, where the three panels from top to bottom are for Betti-0, Betti-1, and Betti-2, respectively. Since the set of atoms included in the wild type and the mutant is the same except for that in the mutation site, the obvious difference in persistent homology barcodes is induced by the mutation. The increase of residue size results in tighter parttern of Betti-0 bars where there are fewer relatively long bars and more Betti-1 and Betti-2 bars in a shorter distance scale are observed. Figure 2: An illustration of persistent homology barcode changes from wild type to mutant proteins. a The wild type protein (PDB:1ey0) with residue
88 as Gly. b The mutant with residue
88 as Typ. c Wild type protein barcodes for heavy atoms within
6 ? of the mutation site. Three panels from top to bottom are Betti-0, Betti-1, and Betti-2 barcodes, respectively. The horizontal axis is the ?ltration radius (?). d Mutant protein barcodes obtained similarly as those for the wild type. Nonetheless, the above topological representation of proteins does not contain suf?cient biological informa- tion, such a........