Skip to main content

Table 2 Comparison of the redundancy reduction methods for training datasets.

From: Prediction of RNA-binding amino acids from protein and RNA sequences

Dataset construction

Sensitivity (%)

Specificity (%)

Accuracy (%)

NP (%)

Fm (%)

CC

PRI727 dataset

S-method (100%)

84.1

75.8

76.3

80.0

79.7

0.32

S-method (80%)

84.9

74.3

74.9

79.6

79.2

0.31

S-method (60%)

85.4

72.7

73.5

79.1

78.6

0.30

F-method

87.2

81.7

82.1

84.5

84.4

0.40

PRI267 dataset

S-method (100%)

46.4

86.8

85.9

66.6

60.5

0.14

S-method (80%)

48.4

85.7

84.9

67.2

62.2

0.14

S-method (60%)

49.6

84.5

83.8

67.0

62.5

0.13

F-method

60.7

91.0

90.3

75.8

72.8

0.24

  1. S-method is the sequence similarity-based redundancy reduction using the CD-HIT program. The number in the parenthesis indicates the sequence identity threshold of CD-HIT clusters. F-method is the feature vector-based redundancy reduction. The SVM model was trained and tested using 9 features and a window size of 15. NP: net prediction. Fm: F-measure. CC: correlation coefficient.