PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

BMC Bioinformatics

Table 1 Datasets used for optimizing large-scale genomic sequencing reads compression

Dataset name (species)	Sequencing platform (method)	Number of reads (millions)	Total size (GB)	Reads size (GB)	Length (bp)	Number of files
Homo sapiens	NextSeq-550 (SE)	507.40	111.25	36.78	75	24
Cicer arietinum	HiSeq-2000 (PE)	2060.96	680.60	178.54	90	60
Salvelinus fontinalis	Ion-Torrent (SE)	757.46	189.66	58.14	80	360
Total	–	3325.82	981.52	273.46	–	444

Datasets download by employing sra-tools (https://github.com/ncbi/sra-tools). For complete NCBI registration numbers, see Additional file 1: Section S4

ISSN: 1471-2105