Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability

BMC Bioinformatics

Table 6 Significant predictors in full models and their statistical details

Energy type	Predictor	Coefficient estimate	p value	Effect on interval width
Fold	Van der Waals	0.631	5.67e−14	1.241
	Van der Waals clash	0.431	< 2e−16	0.705
	Entropy, side chain	0.531	3.16e−7	0.775
	SD of total energy	0.569	8.55e−6	0.519
	Mutation involving proline	0.724	2.08e−4	0.761
	Secondary structure—B	Reference	N/A	N/A
	Secondary structure—E	0.239	0.375	N/A
	Secondary structure—G	0.069	0.847	N/A
	Secondary structure—H	− 0.181	0.499	N/A
	Secondary structure—None	0.026	0.924	N/A
	Secondary structure—S	0.144	0.633	N/A
	Secondary structure—T	− 0.148	0.604	N/A
Bind	Van der Waals clash	0.418	3.28e−5	2.531
	SD of backbone van der Waals clash	6.69	3.63e−4	0.864
	SD of van der Waals clash	− 1.06	5.69e−4	− 2.518
	SD of entropy, side chain	− 1.21	9.08e−4	− 0.748
	SD of total energy	1.32	4.85e−6	3.791
	Secondary structure—B	Reference	N/A	N/A
	Secondary structure—E	0.240	0.655	N/A
	Secondary structure—G	− 0.617	0.434	N/A
	Secondary structure—H	0.116	0.830	N/A
	Secondary structure—None	− 0.236	0.659	N/A
	Secondary structure—S	0.336	0.547	N/A
	Secondary structure—T	0.250	0.650	N/A
	RSA	− 1.45	3.51e−7	− 0.977

The coefficients and p-values are from the model fitted to all datapoints. The effect of each predictor on interval width (right column) is a rescaled version of the coefficient. It is the difference in interval width between two hypothetical mutations—one representing the upper and one the lower 10% quantile of the predictor—while holding all other predictors constant. More precisely, for each predictor, we generated two contrasting mutations with predictor values equal to the mean of the predictor among the upper and lower 10% quantiles, set all other predictor values to their dataset-wide averages, used the predict() function in the model to predict the upper bound on their error, and took the difference. In the case of the proline predictor, the contrasting mutations were with and without proline (with everything else at their averages)

ISSN: 1471-2105