Skip to main content

Table 1 Four medical informatics datasets used in experiments

From: Mapping biological entities using the longest approximately common prefix method

#

Dataset

# of concepts

# of terms

Size in kilobytes

D 1

The UMLS most frequent concepts from multiple sources

100

4,979

369

D 2

The SNOMED CT most frequent concepts

155

5,000

281

D 3

The UMLS concepts with longest terms (“longest concepts”)

3,337

5,000

1,693

D 4

The SNOMED CT longest concepts

1,805

5,000

903