Skip to main content

Table 1 Data sets included in database 17DataSets

From: Cluster oligonucleotide signatures for rapid identification by sequencing

Data set

N

S

i

n

\(\bar {L} \pm \sigma (L)\)

n ∗

c

c/n

n∗/n

s 0

s s

s n

s c

δ c

Anisogramma

15248

28

26

54

545

94

33

139

2.6

61%

1

4%

24

86%

27

96%

28

100%

4%

Pectobacterium

72624

37

42

79

1671

290

43

258

3.3

54%

-

-

25

68%

28

76%

35

95%

19%

Ceratorhiza

24645

37

35

72

647

60

36

137

1.9

50%

7

19%

24

65%

25

68%

34

92%

24%

Coniella

23078

48

46

94

481

64

45

143

1.5

48%

7

15%

32

67%

37

77%

48

100%

23%

Talaromyces

54964

88

86

174

625

220

126

626

3.6

72%

-

-

87

99%

88

100%

88

100%

-

Elsinoe

79740

132

63

195

586

146

54

199

1.0

28%

1

1%

37

28%

40

30%

43

33%

2%

Claviceps

77453

140

139

279

553

45

92

376

1.3

33%

16

11%

58

41%

63

45%

82

59%

14%

Ceratocystis

112291

193

179

372

582

205

115

631

1.7

31%

52

27%

74

38%

82

42%

149

77%

35%

Phytophthora

201815

253

238

491

798

24

319

1103

2.2

65%

-

-

149

59%

166

66%

184

73%

7%

Diaporthe

213202

399

338

737

530

99

196

1008

1.4

27%

149

37%

140

35%

150

38%

266

67%

29%

Peronospora

428994

513

400

913

824

377

349

1984

2.2

38%

64

12%

200

39%

222

43%

310

60%

17%

Alternaria

280418

551

550

1101

509

11

187

734

0.7

17%

-

-

78

14%

86

16%

101

18%

3%

Aspergillus

547127

1032

1032

2064

530

39

591

2331

1.1

29%

19

2%

285

28%

313

30%

414

40%

10%

Colletotrichum

691867

1198

918

2116

576

297

477

2010

0.9

23%

562

47%

379

32%

397

33%

667

56%

23%

Tilletia

743335

1200

915

2115

618

259

574

2666

1.3

27%

394

33%

376

31%

403

34%

649

54%

20%

Penicillium

743954

1438

1437

2875

517

12

597

2675

0.9

21%

57

4%

310

22%

325

23%

413

29%

6%

Fusarium

1604775

2946

2261

5207

533

133

1165

4417

0.8

22%

1492

51%

969

33%

1001

34%

1778

60%

26%

  1. N: size of data set (nucleotides), S: number of sequences (other than sequences with more than 5 ambiguous bases), i: number of internal clades in the phylogenetic tree, n: total number of phylogenetic clades n=S+i, \(\bar {L}\): average length of sequences in the data set (rounded to closest integer), σ(L): corrected sample standard deviation for the sequence length (rounded to closest integer). n∗: number of signable clades, c: number of clusters (λ=36) identified by aodp, c/n: ratio between clusters and phylogenetic clades, n∗/n: ratio between signable clades and phylogenetic clades, s0: number of sequences that are not included in any signable clades, ss: signable sequences (also unique signable sequence patterns), sn: unique signable clade patterns, sc: unique cluster patterns, δc=sc−sn: discrimination increase attributable to clusters (difference between unique cluster patterns and unique signable clade patterns)