A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

Table 4 Computing times.

		no selection			univariate selection			Multivariate selection (Gini importance)			multivariate selection (PLS/PC)
		PLS	PC	RF	PLS	PC	RF	PLS	PC	RF	PLS	PC	RF
MIR BSE	orig	5.7	11.1	9.9	46.4	53.9	46.8	88.8	97.0	91.5	87.9	92.4	88.0
	binned	2.8	3.2	3.1	13.6	14.7	15.9	26.1	27.1	29.0	28.7	29.6	31.5
MIR wine	French	8.8	7.8	2.4	26.6	21.8	7.7	47.0	45.9	33.5	17.2	14.7	7.4
	grape	12.1	10.3	2.5	28.9	22.3	8.0	54.0	47.6	33.5	15.8	13.1	6.5
NMR tumor	all	0.3	0.4	0.4	1.4	1.2	2.1	2.9	2.7	3.6	3.6	3.4	4.3
	center	0.2	0.2	0.2	1.1	0.8	1.1	2.2	1.9	2.1	2.1	1.8	2.0
NMR candida	1	4.6	8.8	7.7	22.4	41.2	37.1	43.5	62.5	61.1	59.8	78.4	75.4
	2	3.7	4.8	3.8	18.0	22.0	19.4	34.5	38.5	37.3	36.3	40.3	37.9
	3	3.7	4.7	3.7	17.4	20.1	17.9	33.4	36.0	34.7	34.6	37.8	35.1
	4	3.9	5.1	4.8	18.7	23.4	24.3	36.0	40.5	60.5	41.6	46.2	47.0
	5	3.5	3.9	2.6	31.9	32.4	27.0	62.6	63.0	60.0	58.3	43.4	38.5

The table reports the runtime for the different feature selection and classification approaches, and the different data sets (on a 2 GHz personal computer with 2 GB memory). Values are given in minutes, for a ten-fold cross-validation and with parameterisations as used for the results shown in Tables 2 and 3. For all methods, a univariate feature selection takes about five times as long as a classification of the same data set without feature selection. Both multivariate feature selection approaches require approximately the same amount of time for a given data set and classifier. Their computing time is no more than twice as long as in a recursive feature elimination based on a univariate feature importance measure.

ISSN: 1471-2105