Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Merged consensus clustering to assess and improve class discovery with microarray data

Figure 2

Consensus clustering with simulated gene expression profile data. (A) Simulated gene expression sets were generated randomly from normal distributions centred around four characteristic profiles (1,0,1,1), (0,1,1,0), (1,1,0,0), (0,1,0,0) which were then spiked with expression profiles centred on (1,1,1,1), (0,0,1,1), (1,1,0,0) and (0.5,0.5,0,0). (B) Unsupervised clustering with agnes and pam was used to partition the expression data into four clusters. Only profiles 1 and 2 were successfully identified by agnes, profiles 3 and 4 were consolidated and the (0,0,1,1) spike data segregated into a new profile (top row). Conversely, pam identified four expression profiles and segregated spike data into the closest matching profiles (bottom row). (C) Clustering with agnes and pam was repeated using clusterCons and the membership robustness calculated for each profile ('consensus' panels). For agnes, the spike data in cluster 1 are revealed as outliers (open triangles) and the robustness for cluster 3 is noticeably lower than clusters 1-4 reflecting its heterogeneity. Clusters produced by pam showed high robustness with only the spike data in cluster 3, observable as outliers (open triangles). Merge consensus matrices were generated from these two consensus clustering results and cast onto the agnes and pam clustering structures ('merge' panels) producing a more balanced view of membership (and hence cluster) robustness. For agnes, as expected, profiles 1, 2 and 4 remain largely unaffected, but profile 3 is heavily penalised as it is inconsistent between clustering algorithms. All pam profiles retain their high membership robustness, but now spike data are revealed for all profiles as outliers. (D) The optimal cluster number was estimated by finding the largest change in area under the cumulative density curve (AUC) for the consensus matrix of each clustering experiment by cluster number. Using this approach, the merge consensus matrix correctly predicted an optimal cluster number of 4, whereas agnes predicted 5.

Back to article page