Skip to main content

Table 5 Random data simulations of real data sets. This table compares the results found from the real data (Real column) to two different types of random data. The Random column contains the experimentally determined largest number of pairs found from 10 simulation runs using a random data matrix (drawn from a uniform distribution) where the number of genes and class sizes is the same as the indicated for the real data. The Label Shuffled column contains the experimentally determined largest number of pairs found from 30 simulation runs where the class labels were randomly shuffled. In the samples column, the number in parenthesis is the number of positive samples. The numbers after the slash are the number of single genes found. Label shuffling leads to more pairs found "by chance" only for the smaller data sets. The small data sets have large numbers of pairs expected "by chance".

From: Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery

Data set

Samples

Genes

Real

Random

Label Shuffled

GIST

19(6)

1987

137981/74

2706/0

4622/2

BreastBRCA(brca1 vs brca2)

15(7)

3226

143574/18

20563/2

53900/11

BreastBRCA(brca1 & brca2 vs Sporadic)

22(7)

3226

2114/0

1286/1

0/0

Cutaneous

38(7)

3613

596/0

62/0

24/0

LungStanford

52(13)

918

486/2

0/0

0/0

LungBeer

96(10)

4966

22102/5

0/0

0/0

Prostate

34(9)

3958

249662/52

57/0

13/0