Skip to main content

Table 8 Comparison of performance and applicability of different NER systems on BioCreative 2 test set

From: Incorporating rich background knowledge for gene named entity classification and recognition

System or authors

Precision

Recall

F-score

# of features

Tagging complexity

Availability

CRF 1 (ABNER+)

87.30

80.68

83.86

171,251

LM

N

CRF 2 (ABNER++)

87.39

81.96

84.59

355,461

LM

N

Dictionary

90.37

82.40

86.20

0

Trie

Y

Dictionary + CRF 2

90.52

87.63

89.05

355,609

LM

Y

BANNER [26, 27]

88.66

84.32

86.43

500,876

LM+POS tagger

Y

Ando [5] (1st in BioCreative 2)

88.48

85.97

87.21

--

2*LM+POS tagger+syntactic parser

N

Hus et al. [10]

88.95

87.65

88.30

8 * 5,059,368

8*LM+POS tagger

N

  1. In the 6th column, 'LM' and 'Trie' respectively refer to the time complexities of a linear model and a Trie tree based dictionary match. The 'Dictionary' method doesn't need any feature, once the dictionary is constructed. For Ando's system, we cannot find the number of features in the paper [5]. Since the systems in the last two rows used classifier combination, the tagging complexities and numbers of features are multiplied by the numbers of sub-models.