Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery

BMC Bioinformatics

Table 5 Random data simulations of real data sets. This table compares the results found from the real data (Real column) to two different types of random data. The Random column contains the experimentally determined largest number of pairs found from 10 simulation runs using a random data matrix (drawn from a uniform distribution) where the number of genes and class sizes is the same as the indicated for the real data. The Label Shuffled column contains the experimentally determined largest number of pairs found from 30 simulation runs where the class labels were randomly shuffled. In the samples column, the number in parenthesis is the number of positive samples. The numbers after the slash are the number of single genes found. Label shuffling leads to more pairs found "by chance" only for the smaller data sets. The small data sets have large numbers of pairs expected "by chance".

Data set	Samples	Genes	Real	Random	Label Shuffled
GIST	19(6)	1987	137981/74	2706/0	4622/2
BreastBRCA(brca1 vs brca2)	15(7)	3226	143574/18	20563/2	53900/11
BreastBRCA(brca1 & brca2 vs Sporadic)	22(7)	3226	2114/0	1286/1	0/0
Cutaneous	38(7)	3613	596/0	62/0	24/0
LungStanford	52(13)	918	486/2	0/0	0/0
LungBeer	96(10)	4966	22102/5	0/0	0/0
Prostate	34(9)	3958	249662/52	57/0	13/0

ISSN: 1471-2105