Skip to main content

Table 1 Lexical features obtained directly from sentences

From: Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

Features

Definitions/Remarks

Values

Examples

Keyword

Words indicating relationship between two proteins.

One of the 180 kinds of words obtained by stemming 642 kinds of words such as ‘bind’, ‘link’, ‘stimulate’, ‘interact’,‘induce’, ‘regulate’, ‘mediate’,‘inhibit’, etc., which often exist in sentences containing PPIs.

In sentence IEPA.d0.s0 (Fig. 1), feature keyword’s value is ‘stimulate’.

Negative word

Check if one such negative word as ‘not’, ‘incapable’, and ‘unable’ appears between keyword and one of the two protein names or between two protein names.

‘true’ or ‘false’

In sentence HPRD50.d21.s1 of HPRD50 corpus, “In contrast to OX1R, the potency of direct activation of CB1 was not affected by co-expression with OX1R,” feature value is ‘true’.

Conjunctive word

Check if one of the following words indicating a conjunctive relation appears: ‘although’, ‘though’, ‘because’, ‘as’, ‘therefore’, ‘hence’, ‘since’, ‘so’, ‘where’, ‘when’, ‘what’, ‘why’, ‘how’, ‘wherein’, ‘whereas’, and ‘whereby’.

‘true’ or ‘false’

In sentence HPRD50.d21.s1 above, feature value is ‘false’.

‘Which’

Check if ‘which’ appears. Although ‘which’ also shows conjunctive relations, because ‘which’ appears more often than the conjunctive words listed above, we differentiate it from the above features.

‘true’ or ‘false’

In sentence LLL.d13.s0 of LLL corpus, “Production of sigmaK about 1h earlier than normal does affect Spo0A, which when phosphorylated is an activator of sigE transcription,” feature value is ‘true’.

‘But’

Check if ‘but’ appears. Although ‘but’ also appears as frequently as ‘which’ to represent conjunctive relations, ‘but’ implies negation of context.

‘true’ or ‘false’

In sentence AIMed.d55.s485 of AIMed corpus, “LEC also induced calcium mobilization, but marginal chemotaxis via CCR5,” feature value is ‘true’.

Words indicating condition or presumption

Check if ‘if’ or ‘whether’ appears between keyword and one of the two protein names or between two protein names.

‘true’ or ‘false’

In sentence IEPA.d0.s0 (Fig. 1), feature value is ‘false’.

Preposition of keyword

Preposition following keyword providing that the distance between it and the keyword is within 3. If there are many prepositions, the preposition closer to the keyword is utilized.

One of the prepositions

In sentence AIMed.d55.s487 of AIMed corpus, “The binding of LEC to CCR8 was much less significant,” feature value is ‘of’.

Second keyword

Only one of seven words, ‘bind’, ‘interact’, ‘stimulate’, ‘associate’, ‘regulate’, ‘induce’, and ‘known’, is not chosen as a keyword, check if that word appears between two protein names. These seven words can be considered especially significant in PPI classification compared with other keywords. This feature prevents these words from being overlooked as keywords.

‘true’ or ‘false’ for each of these seven words (If one of these seven words appears in the sentence and is not chosen as a keyword, feature value for it is ‘true’).

In sentence IEPA.d0.s0 (Fig. 1), because ‘stimulate’ was already chosen as a keyword and the sentence does not contain the other six words, feature value of the second keyword (‘stimulate’) is ‘false’ and feature value of the other six words is also ‘false’.

  1. Lexical features extracted from sentences