Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data

Fig. 2

Heterogeneity of the mutation rate and explanatory variables. a Heterogeneity among cancer types and samples. Violin plot for the mutation probability for 14 cancer types. b Heterogeneity along the genome and the correlation with categorical explanatory variables. Relative proportion of mutations from nucleotide C or T in the neighboring context A,G,C,T (2·4·4=32 possibilities), relative proportion of mutations of six different genomic elements, and relative proportion of mutations within and outside repeat regions or CpG islands. c Heterogeneity correlated with continuous variables. Left column: continuous variables. Middle column: The continuous annotations are discretized into bins according to quantiles for site-specific regression models. Each bin is represented by the mean value within the bin. Grey transparent histograms: distribution of the continuous values of the annotation along the genome. Black transparent histograms: distribution of the discrete bins of the annotation (binning scheme in italics in the column “Annotation”). Black diamonds: Discrete value used for the binning. Right column: Predicted (lines) and observed (points) mutation rate for each cancer type and explanatory variables. The regression lines are generated under a multinomial logistic regression model using only the corresponding explanatory variable. Details about the different data types can be found in “Somatic mutation dataset” section

Back to article page