Skip to main content

Table 1 Biomedical NER datasets used in the experiments

From: Multitask learning for biomedical named entity recognition with cross-sharing structure

Dataset

Size

Entity types & counts

BC2GM

20,131 sentences

Gene (24,583)

Ex-PTM

3,653 sentences

Protein (4,698)

NCBI-disease

7,287 sentences

Disease (6,881)

Linnaeus

23,155 sentences

Species (4,263)

JNLPBA

24,806 sentences

Cell (12,969), Gene (10,589), Protein (35,336)

BC5CDR

13,938 sentences

Chemical (15,935), Disease (12,852)

BioNLP09

11,356 sentences

Protein (14,963)

BioNLP11ID

5,178 sentences

Chemical (973), Protein (6,551), Species (3,471)

BioNLP13PC

5,051 sentences

Cell (1,013), Chemical (3,989), Gene (10,891)