Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

Table 1 Cross validation results on overcoming the bias of cluster size difference. By using distribution models as reference distribution gap statistics can give correct result even under 10-fold difference.

Difference between sample number of ℙ₁ and ℙ₂	Average cluster number estimation accuracy % (Uniform reference distribution)	Average cluster number estimation accuracy % (GMM as reference distribution for ℙ₁, uniform reference for ℙ₂)
Equal	100	100
2-fold	88.5	98.1
3-fold	81.8	93.3
5-fold	69.2	91.0
7-fold	<20	89.5
9-fold	<20	88.9
10-fold	<15	87.4

ISSN: 1471-2105