Skip to main content

Table 5 Accuracy of machine learning predictions.a

From: Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable

 

J48

Naïve Bayes

SMO

a) Sequences folding to the top 10% of designable structures vs. sequences folding to the bottom 10% of designable structures for both shapes

69.5% correct

65.0% correct

65.6% correct

 

AUC 0.73

AUC 0.69

AUC 0.67

 

Sens: 0.67

Sens: 0.66

Sens: 0.71

 

Spec: 0.71

Spec: 0.65

Spec: 0.64

b) Sequences folding to the top 10% of designable structures of hexagonal shape vs. sequences folding to the bottom 10% of designable structures in the triangular shape

98.1% correct

84.9% correct

87.0% correct

 

AUC 0.99

AUC 0.92

AUC 0.87

 

Sens: 0.98

Sens: 0.82

Sens: 0.84

 

Spec: 0.98

Spec: 0.90

Spec: 0.92

c) Sequences folding to the top 10% of designable structures of triangular shape vs. sequences folding to the bottom 10% of designable structures in the hexagonal shape

98.0% correct

65.8% correct

64.3% correct

 

AUC 0.99

AUC 0.70

AUC 0.63

 

Sens: 0.98

Sens: 0.64

Sens: 0.75

 

Spec: 0.98

Spec: 0.72

Spec: 0.66

  1. a For classifying a)sequences folding to highly-designable conformations for the hexagonal and triangular shapes against sequences folding to the least designable conformations for these two shapes; b)sequences folding to the most designable conformations of the hexagonal shape against sequences folding to the least designable conformations of the triangular shape and c)sequences folding to the most designable conformations of the triangular shape against sequences folding to the least designable conformations of the hexagonal shape. Prediction accuracy and area under the curve (AUC), sensitivity (Sens) and specificity (Spec) for each method are given.