Skip to main content

Table 4 Comparison of performance of different methods on reconstruction of three haplotypes for real kangaroo data sets from the mixture of reads [8] for (a) amplicon 1, (b) amplicon 2, and (c) amplicon 3

From: HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations

a. Amplicon 1 (total length of three haplotypes: 13921)

Software

# contigs

Longest

N50

Haplotypes

Error rate %

 

≥ 500bp

contig

 

coverage %

 

HaploJuice

3.0 ± 0.0

4613 ± 2.1

4612 ± 2.0

99.4 ± 0.0

0.05 ± 0.07

hmmfreq[8]

3.0 ± 0.0

4485 ± 0.6

4484 ± 0.6

96.6 ± 0.0

0.26 ± 0.10

shoRAH[3]

24.0 ± 2.6

4592 ± 7.0

4592 ± 6.0

95.6 ± 10.4

1.05 ± 0.32

SAVAGE[4]

13.2 ± 2.1

903 ± 132.3

482 ± 169.6

47.3 ± 5.2

0.02 ± 0.04

PredictHaplo[5]

1.1 ± 0.3

4630 ± 2.0

462 ± 1461.3

36.5 ± 10.5

0.01 ± 0.01

QuRe[6]

4.0 ± 1.9

4343 ± 9.9

3909 ± 1373.7

74.9 ± 21.8

0.42 ± 0.32

b. Amplicon 2 (total length of three haplotypes: 12694)

Software

# contigs

Longest

N50

Haplotypes

Error rate %

 

≥ 500bp

contig

 

coverage %

 

HaploJuice

3.0 ± 0.0

4120 ± 1.5

4120 ± 1.5

97.4 ± 0.0

0.02 ± 0.03

hmmfreq[8]

3.0 ± 0.0

3998 ± 4.0

3998 ± 4.0

94.5 ± 0.1

0.02 ± 0.01

shoRAH[3]

24.2 ± 5.7

4119 ± 14.5

4118 ± 12.1

90.8 ± 13.5

0.41 ± 0.48

SAVAGE[4]

8.8 ± 3.8

1806 ± 761.5

572 ± 81.7

50.2 ± 4.7

0.00 ± 0.00

PredictHaplo[5]

2.0 ± 0.0

4140 ± 2.6

4136 ± 0.0

65.2 ± 0.0

0.00 ± 0.00

QuRe[6]

2.4 ± 0.7

3746 ± 4.7

3373 ± 1185.0

38.4 ± 14.3

0.22 ± 0.28

c. Amplicon 3 (total length of three haplotypes: 15391)

Software

# contigs

Longest

N50

Haplotypes

Error rate %

 

≥ 500bp

contig

 

coverage %

 

HaploJuice

3.0 ± 0.0

5116 ± 9.1

5111 ± 7.7

99.6 ± 0.1

0.01 ± 0.00

hmmfreq[8]

3.0 ± 0.0

5029 ± 3.1

5027 ± 3.6

98.0 ± 0.1

0.23 ± 0.11

shoRAH[3]

27.6 ± 3.0

5132 ± 7.1

5111 ± 7.4

96.3 ± 10.5

1.91 ± 0.44

SAVAGE[4]

11.8 ± 2.3

2510 ± 672

550 ± 40.4

55.6 ± 4.3

0.01 ± 0.01

PredictHaplo[5]

1.6 ± 0.5

5170 ± 3.9

3070 ± 2642.4

53.3 ± 17.2

0.14 ± 0.09

QuRe[6]

3.0 ± 1.1

4567 ± 2.1

4106 ± 1442.7

35.6 ± 12.5

0.25 ± 0.28

  1. There are 10 data sets for each amplicon with total coverage of the reads 1600x. For each data set, the sub-samples were mixed in the proportions: 0.125, 0.25, 0.625. The format of data is: average ± standard deviation. The best value for each column is highlighted among the methods with contigs over 90% coverage on three haplotypes