Skip to main content

Table 2 Statistics of the benchmark data sets for the GE and CO tasks.

From: The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011

 

Training

Tuning

Test

Item

Abs.

Full

Abs.

Full

Abs.

Full

Articles

800

5

150

5

260

4

Words

176146

29583

33827

30305

57256

21791

Proteins

9300

2325

2080

2610

3589

1712

Coreferences

2247

-

463

-

714

-

   Relative pronouns

1193

-

254

-

349

-

   Pronouns

738

-

149

-

269

-

   Definite NPs

296

-

58

-

91

-

   Appositions

9

-

1

-

3

-

   Others

11

-

1

-

2

-

Events

8615

1695

1795

1455

3193

1294

   Gene_expression

1738

527

356

393

722

280

   Transcription

576

91

82

76

137

37

   Protein_catabolism

110

0

21

2

14

1

   Phosphorylation

169

23

47

64

139

50

   (with Site)

(67)

(0)

(27)

(12)

(81)

(15)

   Localization

265

16

53

14

174

17

   (with Loc)

(116)

(12)

(32)

(10)

(111)

(2)

   Binding

887

101

249

126

349

153

   (with Site)

(138)

(34)

(50)

(114)

(24)

(79)

   Regulation

961

152

173

123

292

96

   (with Site)

(57)

(8)

(39)

(17)

(11)

(3)

   Positive_regulation

2847

538

618

382

987

466

   (with Site)

(175)

(7)

(75)

(47)

(37)

(7)

   Negative_regulation

1062

247

196

275

379

194

   (with Site)

(27)

(9)

(6)

(18)

(10)

(7)

  1. The events and the coreferences annotations are used for the GE and CO tasks, respectively.