Skip to main content

Table 3 Structural comparison of networks on subsets of the data (resilience to incomplete data)

From: A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen

 

Avg. degree

D

CC

Number Comp.

Largest Comp.

|C1|

|C>2|/|V|

Genus prec.

Genus recall

Species prec.

Species recall

GroEL Threshold

           

Full

10.5

5

0.98

304

46

188

62.3%

90.3%

47.3%

60.6%

63.2%

Sample Avg.

6.2

4

0.97

221

29

150

56.1%

89.7%

46.7%

62.8%

64.5%

Sample 1

6.2

5

0.97

221

30

151

53.8%

90.8%

47.2%

62.2%

65.4%

Sample 2

6.3

4

0.97

229

29

158

52.0%

89.3%

45.4%

62.1%

62.6%

Sample 3

6.5

4

0.98

223

27

152

53.2%

89.2%

45.5%

61.0%

63.0%

Sample 4

6.3

5

0.96

212

32

141

56.7%

87.7%

46.2%

63.4%

64.9%

Sample 5

5.9

4

0.98

218

25

146

58.5%

91.6%

49.4%

65.3%

66.7%

GroEL DiWANN

           

Full

2.6

7

0.19

179

34

0

85.4%

80.4%

43.9%

59.5%

61.8%

Sample Avg.

2.7

6

0.41

119

26

0

86.4%

75.8%

51.1%

55.2%

67.8%

Sample 1

2.8

7

0.52

100

33

0

86.4%

73.6%

52.1%

53.4%

68.4%

Sample 2

2.4

5

0.21

111

22

0

85.6%

78.2%

50.4%

56.4%

66.2%

Sample 3

2.8

6

0.58

113

24

0

84.0%

73.6%

52.7%

54.0%

69.2%

Sample 4

2.6

5

0.38

105

23

0

87.3%

77.9%

46.0%

57.9%

67.6%

Sample 5

2.7

7

0.36

105

29

0

88.9%

75.8%

54.1%

54.5%

68.0%

  1. This table shows a comparison of both structure and clustering results for the GroEL dataset for networks generated from a random sample of 60% of the sequences. D denotes diameter, and CC denotes clustering coefficient. Also shown are the number of connected components, and the size of the largest component. |C1| gives the number of nodes in clusters of size 1 (singletons), and |C>2|/|V| shows the percentage of nodes in a cluster of size 3 or above. The full network is also included for comparison. For the threshold based networks, we use a threshold of 30, which had a good trade-off of precision and recall in the community analysis. The full networks contain 812 nodes, while each reduced network contains 487 nodes