Mining clinical relationships from patient narratives

BMC Bioinformatics

Table 3 Feature sets for learning.

Feature set	Size	Description
tokN	8N	Surface string and POS of tokens surrounding the arguments, windowed -N to +N, N = 6 by default
gentokN	8N	Root and generalised POS of tokens surrounding the argument entities, windowed N to +N, N = 6 by default
atype	1	Concatenated semantic type of arguments, in arg1-arg2 order
dir	1	Direction: linear text order of the arguments (is arg1 before arg2, or vice versa?)
dist	2	Distance: absolute number of sentence and paragraph boundaries between arguments
str	14	Surface string features based on Zhou et al [29], see text for full description
pos	14	POS features, as above
root	14	Root features, as above
genpos	14	Generalised POS features, as above
inter	11	Intervening mentions: numbers and types of intervening entity mentions between arguments
event	5	Events: are any of the arguments, or intevening entities, events?
allgen	96	All above features in root and generalised POS forms, i.e. gen-tok6+atype+dir+dist+root+genpos+inter+event
notok	48	All above except tokN features, others in string and POS forms, i.e. atype+dir+dist+str+pos+inter+event
dep	16	Features based on a syntactic dependency path.
syndist	2	The distance between the two arguments, along a token path and along a syntactic dependency path.

Feature sets used for learning relationships. The table is split into non-syntactic features, combined non-syntactic features, and syntactic features. The size of a set is the number of features in that set.

ISSN: 1471-2105