Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: A statistical framework for detecting mislabeled and contaminated samples using shallow-depth sequence data

Fig. 2

Defining P(G(v)| S) for k = 3. We first enumerate all possible source vectors of length k = 3 (left) then enumerate all labeled genotype vectors consistent with each source vector (right). Each path in a given tree corresponds to a genotype vector given source vector S. For instance, if the three samples are related by source vector (1,1,2), the genotype vector can take one of nine values. We compute the probability of each genotype vector (given S) by traversing each path and taking the product of the probabilities associated with the edges of the path. Note that genotype vectors not consistent with S have probability zero (we omit these paths from the figure). Edge probabilities are defined using user-supplied, population allele frequencies and assuming HWE

Back to article page