Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

Figure 1

Method overview. The figure illustrates the analysis pipeline used to induce and validate models. First data was normalized (or raw data are used: No 0). Then a 5-fold cross validation (CV) was conducted to divide data into training and test sets. The training sets were used to train the models (red box), while the test sets were used to validate their classification power. In order to induce a classification model, some parameters had to be tuned. For example, the different Support vector machines (SVMs) employed different kernels with one or more parameters. The parameter sigma in the radial basis kernel was tuned by conducting a grid search and choosing the values with the lowest error rate during a 10-fold CV. The selected parameter value was finally used to induce a model from the training set in the outer CV and to classify the observations in the corresponding test set. The outer 5-fold CV was performed 10 times resulting in 50 test sets from which we evaluated 50 different models trained on 50 different trainings sets. As a measure of classification performance we used the average fraction of misclassified observations (i.e. error rate) in these 50 test sets.

Back to article page