Skip to main content

Table 3 Feature sets for learning.

From: Mining clinical relationships from patient narratives

Feature set

Size

Description

tokN

8N

Surface string and POS of tokens surrounding the arguments, windowed -N to +N, N = 6 by default

gentokN

8N

Root and generalised POS of tokens surrounding the argument entities, windowed N to +N, N = 6 by default

atype

1

Concatenated semantic type of arguments, in arg1-arg2 order

dir

1

Direction: linear text order of the arguments (is arg1 before arg2, or vice versa?)

dist

2

Distance: absolute number of sentence and paragraph boundaries between arguments

str

14

Surface string features based on Zhou et al [29], see text for full description

pos

14

POS features, as above

root

14

Root features, as above

genpos

14

Generalised POS features, as above

inter

11

Intervening mentions: numbers and types of intervening entity mentions between arguments

event

5

Events: are any of the arguments, or intevening entities, events?

allgen

96

All above features in root and generalised POS forms, i.e. gen-tok6+atype+dir+dist+root+genpos+inter+event

notok

48

All above except tokN features, others in string and POS forms, i.e. atype+dir+dist+str+pos+inter+event

dep

16

Features based on a syntactic dependency path.

syndist

2

The distance between the two arguments, along a token path and along a syntactic dependency path.

  1. Feature sets used for learning relationships. The table is split into non-syntactic features, combined non-syntactic features, and syntactic features. The size of a set is the number of features in that set.