Skip to main content

Table 2 Top features selected via information gain. Column one is the feature. Column two is the average IG (information gain) of the feature calculated from 10-fold cross-validation. Column 3 is the feature's DF (document frequency) calculated from the whole dataset. Only features with an IG greater that 0.01 are shown

From: Automating document classification for the Immune Epitope Database

Feature

IG

DF

Epitope

0.0441

5707

Peptide

0.0402

6408

Amino

0.0369

6461

Sequence

0.0308

6849

Acid

0.0289

6633

~range<50~

0.0247

2915

Synthetic

0.0242

1878

~mhc_allele~

0.0228

2745

overlapping

0.0174

781

Recognized

0.0159

3097

immunodominant

0.0153

1483

Mapping

0.0146

1433

Residues

0.0144

2108

Molecular

0.0126

7405

~peptide~

0.0118

610