Skip to main content

Table 1 Benchmark datasets list

From: A benchmark study of sequence alignment methods for protein clustering

Reference Name

Dataset IDa

Number of sequencesb

Number of classesc

Average length

Reference1

RV11

236

38

301.178

RV12

382

44

392.6885

Reference2

RV20

1706

41

384.3581

Reference3

RV30

1723

30

387.9745

Reference4

RV40

1113

49

480.0952

Reference5

RV50

443

16

516.6546

Reference9

RV911

423

29

701.5792

RV912

228

28

454.0351

  1. aDataset IDs are abbreviation for the datasets and are used to refer the corresponding dataset in this paper. bNumber of sequences means the number of sequences with only one class label in the raw datasets. cNumber of classes means the number of pre-defined protein clusters in each benchmark dataset