编辑: 被控制998 | 2019-07-16 |
2015 Contents
1 Prediction Work?ow
2 2 Classi?cation Performance
3 3 Blind Tests
3 4 Agents and SSF Performance
6 5 Disul?de Bond Prediction
9 6 Mutation Scan
11 7 Misclassi?ed ProTherm Entries
13 8 Statistical Scoring Function Parameters
13 1
1 Prediction Work?ow Figure S1: Prediction work?ow.
Depending on the data set between 2% and 8% of all agent predictions are excluded in the ?nal outlier removal.
2 2 Classi?cation Performance Table S1 summarizes the performance on binary classi?cation. Note that MAESTRO was not specially trained for binary classi?cation, in contrast to the other tools listed in Table S1. Nevertheless, MAESTRO performs similar to the main competitor methods. A prediction is considered to be true positive or true negative, respectively, if the sign of the predicted ??G (or score in case of MAESTRO?Score) matches the sign of the experimental determined ??G. The results are based on the n-fold cross validation experiments (SP1 with 5-fold, SP3 with 20-fold, SP4 with 10-fold) as presented in the main results. Data Recall Prec. Recall Prec. set Method Acc. [+]a [+]a [-]b [-]b MCC AUC SP1 MAESTRO-Score 0.65 0.71 0.36 0.63 0.88 0.29 0.73 MAESTRO 0.82 0.59 0.61 0.89 0.88 0.48 0.84 SP4 MAESTRO-Score 0.63 0.66 0.30 0.62 0.88 0.22 0.68 MAESTRO 0.83 0.41 0.59 0.93 0.87 0.40 0.80 SP3 AUTOMUTE (RF)c 0.86 0.70 0.81 0.93 0.88 0.66 0.91 I-Mutant 2.0c 0.80 0.56 0.73 0.91 0.83 0.51 - mCSMc 0.86 0.67 0.82 0.94 0.87 0.65 0.90 MAESTRO-Score 0.65 0.69 0.45 0.63 0.82 0.29 0.72 MAESTRO 0.84 0.74 0.74 0.89 0.89 0.63 0.90 Table S1: Binary classi?cation results for the SP1 and SP3 data sets. a Results for mutations that stabilize the structures. b Results for mutations with a destabilizing e?ect. c Data taken from Pires et al. (supplementary material) [5].
3 Blind Tests All data sets used in this work contain multiple mutations for certain proteins or even certain mutation sites. In the experiments reported above, the possibly arising correlations introduced by this di?erent types of mutations may eventually have led to a little over?tting on structure or position base. Thus, we performed blind tests to investigate the generalization capabilities of MAESTRO. In the ?rst experiments, the e?ect of the exclusion of certain mutation sites was investigated. We per- formed n-fold cross validation experiments, where all mutations of a mutation site are either exclusively in the training or in the test set. The n-fold cross validations were performed on the SP1 and the SP3 data set. Further, we show the performance on a low-redundancy subset derived from the SP1 data set, provided by Pires et al. [5]. The set includes
351 mutants. For this experiment MAESTRO was trained on the remaining
2297 mutations of the SP1. Regarding the results for this subset, Pires et al. remarked that '
It is important to point out that this data set may not be completely blind for PoPMuSiC, since the chosen mutations could have been considered while training its arti?cial neural network.'
Table S2 shows that the prediction performance on the SP1 and SP4 data set only decreases marginally, in comparison to the 5-fold cross validation experiment (ρ = 0.68) and 10-fold cross validation experiment (ρ = 0.68), respectively, presented in the main results. In case of the blind test on the subset of
351 mutants the performance is similar to the results on the SP2 data set (ρ = 0.70). The relatively large di?erence in performance on the SP3 data set in comparison to the 20-fold cross validation experiment (ρ = 0.84) can be explained by the high number of mutations per site in this set1 . 1Average/median mutations per mutation site: SP1 . . . 1.85/1.00;