Skip to main content
Figure 8 | BMC Bioinformatics

Figure 8

From: Improving the specificity of high-throughput ortholog prediction

Figure 8

Example of the generation of cut-offs for classification of ssd-orthologs and probable paralogs, based on an iterative-true-negative analysis (i.e. based on an introduction of random sets of true-negatives). The particular analysis illustrated here is a Ratio1 analysis for the mouse, rat, human RefSeq RBH dataset, with true-negatives introduced into the mouse (ingroup1) set. In panel A, the number of putative orthologous groups in each ratio range for the true-negative-transformed data set is shown for the whole data set (light shaded bars) and for just the introduced true-negatives only (dark shaded bars). Note how the distribution of the data set differs from that of the true negatives (i.e. introduced paralogs). In panel B, the proportion of randomly introduced true-negatives at 0.5 ratio range intervals is used to formulate cut-offs (denoted by dashed lines) for classifying ssd-orthologs and probable paralogs for the analysis. For the ssd-orthologs cut-off (left-most dashed line), no more than 10% true negatives in a given ratio range are permitted for the ssd-orthologs range. For the probable paralogs cut-off (right-most dashed line) the proportion of true negatives is at or above 50 percent. The resulting middle region bounded by these two cut-off points establishes the "uncertain" orthology class ratio range. Dashed-lines denoting these particular cut-offs are also illustrated on the figure in Panel A for reference. This approach for a true-negative analysis and cut-off generation is also performed for Ratio2 [Additional file 1] and the combination of cut-offs for Ratio1 and Ratio2 are used to classify putative orthologous groups from another data set (such as an RBH-predicted data set) into the three classification levels of "probable ssd-ortholog", "uncertain" and "probable paralogs". Panel C schematically shows the areas of an R1 × R2 that would be classified in this way, with the cut-off numbers in this particular example matching the RefSeq RBH-based mouse-rat-human analysis (see Table 2 for how these ranges are numerically determined).

Back to article page