Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

Fig. 1

Schematic representation of the Know-GRRF method. (a) The data structure. The feature matrix X contains the observed values of P predictors of N samples. The prior matrix A contains functional measures of each predictor from M domains. These functional measures are combined in a linear model to derive a score representing the biological relevance of predictors. The vector Y contains the observed response values of the samples. (b) The feature selection component. Non-leaf nodes are marked with the splitting features and colored by the corresponding biological relevance. Know-GRRF starts with an empty feature set F. In tree 1, three features (X3, X5 and X9) are sequentially added to F based on information gains weighted by biological relevance. In tree 2, because X5 and X9 are already members of F, they are selected based on information gains only. Because X7 is not a member of F, it is selected based on information gain weighted by biological relevance. (cmsubsup) The stability selection component. Know-GRRF first optimizes the tuning parameters on the complete dataset. It then uses bootstrapped samples to select features. After T iterations, features selected in more than a user-define frequency cutoff c are aggregated and constitute the final feature set. Alternatively, Know-GRRF can use stepwise selection to derive the final feature set

Back to article page