Skip to main content

Table 2 Average cross-validated prediction accuracy.

From: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

  

no selection

Univariate selection

Multivariate selection (Gini importance)

multivariate selection (PLS/PC)

  

PLS

PC

RF

PLS

PC

RF

PLS

PC

RF

PLS

PC

RF

MIR BSE

orig

66.8

62.9

74.9

80.7

80.7

76.7

84.1

83.2

77.4

68

63.5

75.5

  

-

-

-

***

***

*

***

***

**

**

  
 

binned

72.7

73.4

75.3

80.4

80.7

76.6

86.8

85.8

77.3

85

82.1

75.6

  

-

-

-

***

***

**

***

***

**

***

***

 

MIR wine

French

69.5

69.3

79.3

83.7

83.5

82.2

82.4

81

81.2

66.9

70.0

79.8

  

-

-

-

***

**

 

***

**

*

   
 

grape

77

71.4

90.2

98.1

98.7

90.3

98.4

98.4

94.2

91.7

88.5

90.4

  

-

-

-

***

***

 

***

***

**

***

***

 

NMR tumor

all

88.8

89

89

89.3

89.3

90.5

90.0

89.6

89.6

89.3

89.2

89.1

  

-

-

-

*

 

***

**

 

*

   
 

center

71.6

72.3

73.1

73.9

72.7

73.9

72.6

72.0

74.3

71.8

72.7

73.3

  

-

-

-

**

  

*

     

NMR candida

1

94.9

94.6

90.3

95.1

94.9

90.6

95.6

95.3

90.3

95.3

95.2

90.7

  

-

-

-

         
 

2

95.6

95.2

93.2

95.8

95.7

93.7

95.6

95.5

93.5

96.0

95.9

94.1

  

-

-

-

      

*

  
 

3

93.7

93.8

89.7

93.7

93.8

89.9

94.2

93.8

89.9

94.0

94.0

90.2

  

-

-

-

   

*

 

*

*

  
 

4

86.9

87.3

83.9

87.8

87.3

84.0

88.2

87.6

84.3

87.7

87.6

84.1

  

-

-

-

   

*

     
 

5

92.7

92.6

89.2

92.7

92.6

89.9

92.5

92.5

90.3

92.8

92.6

90.0

  

-

-

-

         
  1. The best classification results on each data set are underlined. Approaches which do not differ significantly from the optimal result (at a 0.05 significance level) are set in bold type (see methods section). Significant differences in the performance of a method as compared to the same classifier without feature selection are marked with asterisks (* p-value < 0.05, ** p-value < 0.01, *** p-value < .001). The MIR data of this table benefit significantly from a feature selection, whereas the NMR data do so only to a minor extent. Overall, a feature selection by means of Gini importance in conjunction with a PLS classifier was successful in all cases and superior to the "native" classifier of Gini importance, the random forest, in all but one cases.