Skip to main content

Table 3 Time and memory usage of different versions of align-families.py, using different multiple sequence alignment algorithms

From: Family reunion via error correction: an efficient analysis of duplex sequencing data

version

aligner

time/ memory

CPUs

     
   

1

2

4

8

16

32

0.4

MAFFT

time (seconds)

28,638

15,769

8912

5173

3038

1747

2.15

MAFFT

 

28,754

14,282

7079

3463

1686

854

 

Kalign2

 

4731

1777

945

600

381

246

0.4

MAFFT

memory (MB)

23,704

12,299

6622

3755

2284

1602

2.15

MAFFT

 

23,927

12,599

6850

3985

2541

1810

 

Kalign2

 

24,648

23,220

12,408

6668

3781

2327

  1. At low levels of parallelization, Kalign2 made the process over 8 times faster, with a memory usage less than twice as much as MAFFT. The new algorithm sped up the tool between 1 and 2.05x. Naturally, at higher levels of parallelization, the reduction of the job queue bottleneck made more of a difference. Memory usage appeared to not be affected, which is expected due to the small size of the job queue compared with the rest of memory usage. To attempt to disentangle the effects of the job queueing algorithm from all the other changes between 0.4 and 2.15, the two versions were compared with all parameters set as similarly as possible. In both cases, the number of --processes was set to 32 and MAFFT was used as the aligner. Crucially, the --queue-size for the 2.15 version was set to be 32, the same as the number of --processes. This approximates the bottleneck in the pre-2.0 version of Du Novo’s job queueing algorithm. Comparing the median of 3 trials of each, the wallclock time of 2.15 was 27% higher than that of 0.4. This could be because of the higher overhead in the more complicated parallelization algorithm, or other changes between 0.4 and 2.15