Skip to main content

Table 3 Informational density of various corpora

From: Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

A) Corpus

B) # Docs

C) Size (MB)

D) Interactions in RegulonDB

E) Total interactions

F) % In RegulonDB

G) Average doc size (kbs)

H) Interactions/Docs

RN

724

24.9

1026

1643

62.4

65.9

1.41

RP

2475

99.0

1316

2650

49.6

40.0

0.53

RA

3075

3.3

322

414

77.7

1.07

0.1

EA

13334

14.4

402

627

64.1

1.08

0.03

RS

12059

12.3

400

718

55.7

1.02

0.03

ST

58312

10.7

342

691

49.5

0.18

0.005

  1. A comparison of the degree of informativeness with regard to transcriptional regulation in E. coli K-12 in various corpora, as established from the number of RegulonDB-attested interactions they contain; The table includes total number of documents and interactions (Cols. B & E), percentage and number of all interactions found in RegulonDB (Cols. D & F), average size of each document in the corpus (Col. G), ordered by number of RegulonDB interactions per document (Col. H).