From: Multitask learning for biomedical named entity recognition with cross-sharing structure
Dataset | Size | Entity types & counts |
---|---|---|
BC2GM | 20,131 sentences | Gene (24,583) |
Ex-PTM | 3,653 sentences | Protein (4,698) |
NCBI-disease | 7,287 sentences | Disease (6,881) |
Linnaeus | 23,155 sentences | Species (4,263) |
JNLPBA | 24,806 sentences | Cell (12,969), Gene (10,589), Protein (35,336) |
BC5CDR | 13,938 sentences | Chemical (15,935), Disease (12,852) |
BioNLP09 | 11,356 sentences | Protein (14,963) |
BioNLP11ID | 5,178 sentences | Chemical (973), Protein (6,551), Species (3,471) |
BioNLP13PC | 5,051 sentences | Cell (1,013), Chemical (3,989), Gene (10,891) |