Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: Feature-specific quantile normalization and feature-specific mean–variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data

Fig. 3

Model performance in PAM50 and CMS classification without feature selection. Left block (Dark Blue): Supervised classification using RNAseq data as training distribution. Right block (Gold): Supervised classification using microarray data as training distribution. Colour legends for each block are provided. All results are stratified by glmnet and SVM classification models. The y-axis label “Full” denotes models trained on all 12,638 genes (breast) or 13,362 genes (colon). a. Balanced accuracy (y-axis) derived from unseen out-of-fold test data from each normalization method (x-axis) for breast PAM50 classifier trained on RNAseq data. b. Balanced accuracy (y-axis) derived from unseen out-of-fold test data from each normalization method (x-axis) for breast PAM50 classifier trained on microarray data. c. Balanced accuracy (y-axis) derived from unseen out-of-fold test data from each normalization method (x-axis) for colon CMS classifier trained on RNAseq data. d. Balanced accuracy (y-axis) derived from unseen out-of-fold test data from each normalization method (x-axis) for colon CMS classifier trained on microarray data. 95% confidence intervals were calculated using 1,000 bootstraps with replacement. e. Mean absolute scaled error (y-axis) of breast gene expression data that is cross-platform normalized from microarray to RNAseq distribution for each normalization method (x-axis). f. Mean absolute scaled error (y-axis) of breast gene expression data that is cross-platform normalized from RNAseq to microarray distribution for each normalization method (x-axis). g. Mean absolute scaled error (y-axis) of breast gene expression data that is cross-platform normalized from microarray to RNAseq distribution according to feature selection method (x-axis) for FSQN and FSMVN, respectively. h. Mean absolute scaled error (y-axis) of breast gene expression data that is cross-platform normalized from RNAseq to microarray distribution according to each feature selection method (x-axis) for FSQN and FSMVN, respectively.. The significance of a Kruskal–Wallis with Dunn’s post-hoc test is annotated in the plot. (****p < 0.0001, ***p < 0.001, **p < 0.01, *p < 0.05, ns = not significant)

Back to article page