Skip to main content
Figure 9 | BMC Bioinformatics

Figure 9

From: On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

Figure 9

Histograms of the positive and negative concordance rates when applied to seed sequences of 285 SMART and 2381 Pfam domain models. High-quality E-values versus low-quality E-values plots for concordance hits from HMMER2 and HMMER3-dissected results. Figure A and B depict the histograms of the positive concordance rates for the 285 SMART and 2381 Pfam domain models respectively. On average, the positive concordance rates are (99.17 ± 3.46)% for SMART and (99.69 ± 2.13)% for Pfam, suggesting that almost all the seed sequences were correctly labeled as true hits (see vertical dotted lines). 225 (out of 285) SMART and 2142 (out of 2381) Pfam domains have a 100% positive concordance rate as depicted by the horizontal dotted lines. Likewise, Figure C and D show the histograms of the negative concordance rates for the same sets of domains. On average, the SMART and Pfam domains have a negative concordance rate of (0.0033 ± 0.0042)% and (0.0017 ± 0.0341)% respectively (see vertical dotted lines), implying that almost none of the seed sequences are mistaken as false hits. 283 (out of 285) SMART and 2374 (out of 2381) Pfam domains have a zero negative concordance rate as marked by the horizontal dotted lines. Figure E and F plot the high-quality E-values versus the low-quality E-values of the positive (in red) and negative (in blue) concordance hits of the HMMER2/SMART and HMMER2/Pfam dissected results respectively. Similarly, Figure G and H show similar plots for HMMER3/SMART and HMMER3/Pfam dissected results respectively.

Back to article page