Skip to main content

Table 21 Embedding models details

From: Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Embedding model

Language

Domain

Type

Corpus size

Vocab size

Array size

Algorithm

Property

W2V-SBWC

Spanish

General

Word

1.5 billion

68k

300

Word2Vec Skip-gram BOW

Pre-trained

FastText-SBWC

Spanish

General

Word

1.5 billion

81.2k

300

FastText Skip-gram BOW

Pre-trained

FastText-SBC

Spanish

Specific (Biomedical)

Word

600 billion

91.7k

300

FastText Skip-gram BOW

Own

Scielo+Wiki cased

Spanish

Specific (Biomedical)

Word

 

50k

300

FastText Skip-gram BOW

Pre-trained

SNOMED-SBC

Spanish

Specific (Biomedical)

Concept

600 billion

88.1k

300

FastText Skip-gram BOW

Own

Pubmed and PMC

English

Specific (Biomedical)

Word

2 billion

400k

300

Word2Vec Skip-gram BOW

Pre-trained

FastText-2M

English

General

Word

600 billion

2 million

300

FastText Skip-gram BOW

Pre-trained

Sense2vec Reddit

English/Spanish

General

Sense

2 billion

120k

128

Sense2Vec

Pre-trained