Skip to main content

Table 3 Misclassification rate of RPMM cluster analysis to find 2 groups using different variable filtering methods (top 1000 features)

From: Non-specific filtering of beta-distributed data

 

Data set 1

Data set 2

Data set 3

Data set 4

Data set 5

Data set 6

Data set 7

Tissue type

Colon cancer

Glioblastoma

Glioblastoma

Kidney

Kidney

Breast

Breast

Platform

HM27

HM27

HM450

HM27

HM450

HM27

HM450

# of samples

20 non-CIMP vs. 6 CIMP

74 non-CIMP vs. 12 CIMP

93 non-CIMP vs. 6 CIMP

50 KIRC vs. 45 non-cancer

283 KIRC vs. 160 non-cancer

37 Breast cancer vs. 20 non-cancer

56 Breast cancer vs. 17 non-cancer

No filter

0.31

0.22

NA

0

NA

0.12

NA

Filter top 1000 by:

       

Random *

0.34

0.27

0.40

0.004

0.005

0.12

0.20

SD-b

0.19

0.07

0.49

0

0.02

0

0.12

SD-m

0.12

0.07

0.42

0.02

0.03

0.12

0.08

MAD

0.38

0.35

0.49

0

0.005

0

0.14

DIP

0.23

0.36

0.45

0

0.005

0

0.14

Precision

0.08

0

0.10

0.03

0.01

0.11

0.22

BQ-GOF

0.19

0

0.07

0

0.01

0.25

0.23

TM-GOF

0.08

0.02

0.06

0.36

0.47

0.44

0.49

TQ-GOF

0.08

0.03

0.06

0.35

0.47

0.44

0.48

BR

0.12

0.02

0.11

0.02

0.02

0.23

0.19

AR

0.08

0.06

0.11

0.02

0.02

0.25

0.19

WAR

0.12

0.07

0.45

0.02

0.01

0.11

0.10

SD-b + TM-GOF**

0.08

0.07

0.20

0.05

0.01

0.26

0.36

  1. NA = not applicable; Too many features for RPMM to run.
  2. *Average from 10 analyses of randomly sampled feature sets.
  3. **Combine top 500 SD-b + top 500 TM-GOF features.