Skip to main content

Table 1 Cross validation results on overcoming the bias of cluster size difference. By using distribution models as reference distribution gap statistics can give correct result even under 10-fold difference.

From: Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Difference between sample number of â„™1 and â„™2

Average cluster number estimation accuracy % (Uniform reference distribution)

Average cluster number estimation accuracy % (GMM as reference distribution for â„™1, uniform reference for â„™2)

Equal

100

100

2-fold

88.5

98.1

3-fold

81.8

93.3

5-fold

69.2

91.0

7-fold

<20

89.5

9-fold

<20

88.9

10-fold

<15

87.4