Efficient use of unlabeled data for protein sequence classification: a comparative study

BMC Bioinformatics

Table 5 Multi-class remote fold recognition using the mismatch(5,2) kernel

Method	Error	Top-5 Error	Balanced Error	Top-5 Balanced Error	F1	Top-5 F1
Without clustering
full seq.	50.16	21.82	67.17	32.55	37.43	71.40
region	42.83	13.68	61.43	22.63	40.36	79.19
no tails (full seq.)	50.16	21.82	71.81	32.59	30.17	69.12
max. length (full seq.)	52.44	24.43	77.31	39.17	23.98	65.22
With clustering
full seq.	50.33	19.71	70.04	27.21	32.10	75.03
region	40.88	13.68	57.86	22.82	47.54	79.03
no tails (full seq.)	48.37	20.68	69.83	32.27	31.48	70.03
max. length (full seq.)	52.44	23.29	77.05	36.52	26.84	68.02

ISSN: 1471-2105