Skip to main content

Table 1 Datasets used for optimizing large-scale genomic sequencing reads compression

From: PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

Dataset name (species)

Sequencing platform (method)

Number of reads (millions)

Total size (GB)

Reads size (GB)

Length (bp)

Number of files

Homo sapiens

NextSeq-550 (SE)

507.40

111.25

36.78

75

24

Cicer arietinum

HiSeq-2000 (PE)

2060.96

680.60

178.54

90

60

Salvelinus fontinalis

Ion-Torrent (SE)

757.46

189.66

58.14

80

360

Total

3325.82

981.52

273.46

444

  1. Datasets download by employing sra-tools (https://github.com/ncbi/sra-tools). For complete NCBI registration numbers, see Additional file 1: Section S4