Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: Automating document classification for the Immune Epitope Database

Figure 3

Effects of feature selection on Naïve Bayes classifier performance. The performances of the Naïve Bayes classifier (measured in AUC) is plotted against the number of features used in training. Both IG (information gain) and DF (document frequency) based feature selection have a similar effect on classifier performance. Reducing the number of features used to the top 20,000 by each measure leads to a small increase in performance. Using even less features leads to decreases in performance, but notably the top 100 features in term of information gain are sufficient to reach AUC values of 0.82.

Back to article page