Skip to main content

Table 1 Accuracy results for the mean 85 AA COG simulation

From: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

range

ML μ

PP μ

ML σ

PP σ

ML FC

PP FC

ML #

PP #

0.0-0.1

-

-

-

-

-

-

0

0

0.1-0.2

3.57

3.78

3.09

3.27

0.07

0.03

4149

2312

0.2-0.3

2.97

3.19

3.04

3.06

0.16

0.11

15123

9018

0.3-0.4

2.39

2.76

3.00

3.07

0.26

0.17

22696

18373

0.4-0.5

2.25

2.29

3.11

2.98

0.32

0.24

20120

23022

0.5-0.6

2.14

2.11

3.09

3.01

0.36

0.32

17228

20090

0.6-0.7

1.94

1.95

3.04

2.99

0.42

0.38

14113

16223

0.7-0.8

1.86

1.85

3.05

3.01

0.47

0.44

13527

14879

0.8-0.9

1.62

1.65

2.97

2.97

0.55

0.52

14850

15747

0.9-1.0

0.32

0.32

1.54

1.53

0.92

0.92

163815

165957

  1. Error analysis for the COG simulation with the error metric described in the text. As in Figure 6, simulated reads had a normally-distributed length with a mean of 85 amino acids, and a standard deviation of 20. This table pools the results, and shows mean (μ) and standard deviation (σ) of the error, the fraction placed correctly (FC), and the number of reads placed for pplacer run in maximum likelihood (ML) and posterior probability (PP) modes. For example, the "ML" columns in the row labeled 0.4-0.5 shows error statistics for all of the reads in the simulation that had likelihood weight ratio between 0.4 and 0.5: there were 20120 such reads of which 32% were placed correctly, and the corresponding error mean and standard deviation of about 2.25 and 2.29, respectively. This table demonstrates the effectiveness of the confidence scores- as the confidence scores increase, the error decreases. We note that the ML and PP methods have very comparable performance for this length of read, and thus the quickly-calculated ML weight ratio can act as a proxy for the more statistically rigorous posterior probability calculation.