Skip to main content

Table 22 Contextualized word models details

From: Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Detail

SBC-BERT

Bert-base-multilingual-cased

BETO cased

BioBERT-Large

Language

Spanish

104 languages

Spanish

English

Domain

Biomedical

General

General

Biomedical

Type

Contextual Word

Contextual Word

Contextual Word

Contextual Word

Corpus size

6 billion

3300M

3 billion

21.3 billion

Vocab size

200k

120k

31k

59k

Hidden size

768

768

1024

768

Algorithm

BERT train

BERT train

BERT train

BERT train

Property

Own

Pre-trained

Pre-trained

Pre-trained