Skip to main content

Table 1 Summary of the number of individual sequences considered at each step of the workflow

From: PVAmpliconFinder: a workflow for the identification of human papillomaviruses from high-throughput amplicon sequencing

Step

Total sequencing raw reads

TrimGalore

Merging

Dereplication

Chimeric identification

Clustering

Papillomaviridae best hit (eval < =1e-5)

Defined group (same best hit)

Putative new (> 10% dissimilarity)

Putative known (< 10% dissimilarity)

Samples

N paired-end reads

N paired-end reads

%

N sequences

%

N sequences

%

N sequences

%

N sequences

%

N sequences

%

N sequences

%

N sequences

%

S1

564,435

564,064

99.93

551,266

97.73

22,551

4.09

22,498

99.76

79

0.35

61

77.22

0

0

5

8.20

S2

62,148

61,708

99.29

58,031

94.04

3281

5.65

3268

99.60

162

4.96

138

85.19

0

0

6

4.35

S3

316,297

315,999

99.91

307,400

97.28

15,562

5.06

15,562

100.00

51

0.33

49

96.08

1

2.04

18

36.73

S4

109,441

109,326

99.89

106,406

97.33

4842

4.55

4822

99.59

62

1.29

62

100.00

0

0

28

45.16

S5

309,779

309,390

99.87

294,563

95.21

14,101

4.79

14,091

99.93

140

0.99

129

92.14

2

1.55

39

30.23

S6

554,415

551,742

99.52

331,648

60.11

13,820

4.17

19,738

99.41

1162

8.46

910

78.31

0

0

16

1.76

S7

470,655

467,944

99.42

421,764

90.13

28,729

6.81

28,659

99.76

609

2.12

513

84.24

0

0

16

3.12

S8

263,707

263,270

99.83

244,177

92.75

13,293

5.44

13,283

99.92

194

1.46

188

96.91

0

0

8

4.26

Total number of sequences

2,650,877

2,643,443

99.72

2,315,255

87.58

116,179

5.02

115,921

99.78

2459

2.12

2050

83.37

3

0.45

136

16.73