Skip to main content

Table 1 Random undersampling was used for training; thus, the number of negative instances was equal to the number of positive instances

From: Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences

Dataset

 

Training data

Testing data

  

Positives/Negatives

Positives

Negatives

Chr. 1

TIS

17,638

2156

8,074,590

 

STOP

17,404

2154

23,573,031

Chr. 3

TIS

18,631

1163

7,291,951

 

STOP

18,444

1114

21,522,500

Chr. 13

TIS

19,454

340

3,664,164

 

STOP

19,225

333

10,878,302

Chr. 19

TIS

18,383

1411

1,698,891

 

STOP

18,136

1422

4,665,804

Chr. 21

TIS

19,561

233

1,303,634

 

STOP

19,558

237

3,726,959