Skip to main content

Table 6 Significant predictors in full models and their statistical details

From: Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability

Energy type

Predictor

Coefficient estimate

p value

Effect on interval width

Fold

Van der Waals

0.631

5.67e−14

1.241

 

Van der Waals clash

0.431

< 2e−16

0.705

 

Entropy, side chain

0.531

3.16e−7

0.775

 

SD of total energy

0.569

8.55e−6

0.519

 

Mutation involving proline

0.724

2.08e−4

0.761

 

Secondary structure—B

Reference

N/A

N/A

 

Secondary structure—E

0.239

0.375

N/A

 

Secondary structure—G

0.069

0.847

N/A

 

Secondary structure—H

− 0.181

0.499

N/A

 

Secondary structure—None

0.026

0.924

N/A

 

Secondary structure—S

0.144

0.633

N/A

 

Secondary structure—T

− 0.148

0.604

N/A

Bind

Van der Waals clash

0.418

3.28e−5

2.531

 

SD of backbone van der Waals clash

6.69

3.63e−4

0.864

 

SD of van der Waals clash

− 1.06

5.69e−4

− 2.518

 

SD of entropy, side chain

− 1.21

9.08e−4

− 0.748

 

SD of total energy

1.32

4.85e−6

3.791

 

Secondary structure—B

Reference

N/A

N/A

 

Secondary structure—E

0.240

0.655

N/A

 

Secondary structure—G

− 0.617

0.434

N/A

 

Secondary structure—H

0.116

0.830

N/A

 

Secondary structure—None

− 0.236

0.659

N/A

 

Secondary structure—S

0.336

0.547

N/A

 

Secondary structure—T

0.250

0.650

N/A

 

RSA

− 1.45

3.51e−7

− 0.977

  1. The coefficients and p-values are from the model fitted to all datapoints. The effect of each predictor on interval width (right column) is a rescaled version of the coefficient. It is the difference in interval width between two hypothetical mutations—one representing the upper and one the lower 10% quantile of the predictor—while holding all other predictors constant. More precisely, for each predictor, we generated two contrasting mutations with predictor values equal to the mean of the predictor among the upper and lower 10% quantiles, set all other predictor values to their dataset-wide averages, used the predict() function in the model to predict the upper bound on their error, and took the difference. In the case of the proline predictor, the contrasting mutations were with and without proline (with everything else at their averages)