From: Exploring the boundaries: gene and protein identification in biomedical text
Word Features | w i |
---|---|
 | wi-1 |
 | wi+1 |
 | Last "real" word |
 | Next "real" word |
 | Disjunction of 4 previous words |
 | Disjunction of 4 next words |
Bigrams | w i + wi-1 |
 | w i + wi+1 |
TnT POS | POS i |
 | POSi-1 |
 | POSi+1 |
Character Substrings | Up to a length of 6 |
Abbreviations | abbr i |
 | abbri-1+ abbr i |
 | abbr i + abbri+1 |
 | abbri-1+ abbr i + abbri+1 |
Word Shape | shape i |
 | shapei-1 |
 | shapei+1 |
 | shapei-1+ shape i |
 | shape i + shapei+1 |
 | shapei-1+ shape i + shapei+1 |
Previous NE | NEi-1 |
 | NEi-2+ NEi-1 |
Previous NE + Word | NEi-1+ w i |
Previous NE + POS | NEi-1+ POSi-1+ POS i |
 | NEi-2+ NEi-1+ POSi-2+ POSi-1+ POS i |
Previous NE + Shape | NEi-1+ shape i |
 | NEi-1+ shapei+1 |
 | NEi-1+ shapei-1+ shape i |
 | NEi-2+ NEi-1+ shapei-2+ shapei-1+ shape i |
Paren-Matching | A feature that signals when one parentheses in a pair has been assigned a different tag than the other in a window of 4 words |