Skip to main content

Table 1 Feature vectors generated by the feature vector-based redundancy reduction method to the PRI3149 dataset.

From: Prediction of RNA-binding amino acids from protein and RNA sequences

Window size

#Positive feature vectors

#Negative feature vectors

#Total vectors

#Common vectors

with 9 features (L, C, N, H, A, M, pK a , IP, and R)

1

21,282

198,578

219,860

2,811

3

21,282

198,585

219,867

2,811

5

21,283

198,590

219,873

2,811

7

21,283

198,596

219,879

2,811

9

21,284

198,601

219,885

2,811

11

21,284

198,606

219,890

2,811

13

21,284

198,611

219,895

2,811

15

21,284

198,616

219,900

2,811

with 6 features (N, H, A, M, pK a , and IP)

1

6,286

74,829

81,115

3,641

3

6,618

81,390

88,008

3,164

5

6,658

81,729

88,387

3,164

7

6,681

81,891

88,572

3,168

9

6,702

82,010

88,712

3,170

11

6,710

82,129

88,839

3,168

13

6,720

82,242

88,962

3,169

15

6,733

82,349

89,082

3,173

  1. The number of non-redundant feature vectors generated from the PRI3149 dataset by the feature vector-based redundancy reduction method with various window sizes. Common vectors denote the feature vectors with the same vector elements but with different classes. 9 features: protein sequence length (L), amino acid composition (C), normalized position (N), hydropathy (H), accessible surface area (A), molecular mass (M), and side chain pK a of an amino acid, IP of an amino acid triplet (IP), sum of the snormalized position of each nucleotide type (R). 6 features: N, H, A, M, pK a , and IP of an amino acid.