编辑: 枪械砖家 | 2019-07-16 |
10 ? A) are deeply buried in the cores of globular domains. Their ending positions could represent the folding nucleus while TEF would correspond to the ?nal fold for these por- tions. Indeed, we have previously demonstrated that the TEF ends correspond statistically to hydrophobic residues highly conserved in multiple alignments of proteins of common function [17]. These particular positions have been called topohydrophobic, and they are clearly related to amino acids belonging to the folding nucleus [18]. They are derived from multiple alignments of distantly re- lated sequences, typically less than
30 % identity. It constitutes a limitation of the prediction process since most of the available algorithms for multiple alignments of highly divergent sequences produce controversial results [19]. We have shown that MIR and tophohydrophobic positions match in two thirds of the cases which con?rms a reasonable recall of the MIR prediction algorithm. In other words, one has a mean to predict, from the single information of the sequence, positions (MIR) including the folding nucleus. In this paper we present a stability-based analysis that was conducted to better characterize MIRs. The expected results were to improve the precision of the MIR method by re?ning the algorithm with constraints related to the prediction of the stability changes induced by point mutations. We assume that the folding nucleus is the deep core of the structure and thus should be very sensitive to point mutations. For example, if a keystone substitutes another one with a di?erent shape, the vaulting will collapse almost every time.
2 Material and Methods 2.1 MIR Prediction Algorithm A Monte Carlo algorithm is used to simulate the early steps of protein folding on a (2,1,0) lattice. An amino acid is randomly selected and displaced to a new available position on the lattice. The energy of both initial and ?nal conforma- tions is computed from the Miyazawa and Jernigan potential of mean force [20]
56 M. Lonquety, Z. Lacroix, and J. Chomilier and the Metropolis criterion is then applied [21,10]. The starting point is the protein structure in a random coil conformation and the simulation is typically conducted on
106 Monte Carlo steps. This simulation is repeated
100 times with di?erent initial conformations. The number of ?rst neighbors is recorded after each series of
10 Monte Carlo steps, and at the end of the process, an average Number of Contact Neighbors (NCN) is calculated for each amino acid of the sequence. Actually, amino acids surrounded by many others play a role in the compactness of the protein and thus are called Most Interacting Residues (MIR). In contrast, the ones with few neighbors are called Less Interacting Residues (LIR). 2.2 TEF Assignment Along the backbone of a protein, some pairs of amino acids can be very close in several places, with a typical distance between their alpha carbons below
10 ? A. The histogram of the sequence separation between these contact amino acids is not smooth, and presents a maximum around
25 amino acids [15]. These sequence fragments were initially called closed loops [14]. Later on, it has been shown that the ends of these closed loops are mainly occupied by hydrophobic amino acids. A thorough analysis demonstrated that these hydrophobic amino acids were highly conserved among structures of the same family, although containing distantly related sequences: these positions were called topohydrophobic [22]. The concept of TEF emerged from the junction between closed loops and topohydrophobic positions mainly located at their ends. 2.3 Free Energy Calculation Gibbs free energy change due to mutation is a good approximation to character- ize the stability of a given structure. It consists of a succession of energetic terms that attempt to capture all the properties and forces that drive the conformation of a protein. In our study we focus on the di?erence of these energies for the wild type structure ΔGwild and for the mutant structure ΔGmutant. Considering that in the literature various stability prediction methods use di?erent nomenclature, ΔΔG is de?ned as follows: ΔΔG = ΔGmutant ? ΔGwild . (1) The unit is kcal/mol. ΔΔG describes whether it costs more in energy to have the mutated amino acid or the wild type one. For example, if ΔΔG <