Skip to main content

Table 1 RNA-Seq datasets and computing resources used for each RNA-Seq data

From: K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

organism

# of reads

# of unique k-mers

Computing resource

data source

   

MR-Inchworm

Original Inchworm

 

mouse

105,290,476

746,811,557

iDataplex-nextscale

iDataplex-nextscale:single node (64GB mem)

[22]

sugarbeet

129,832,549

2,213,519,875

iDataplex-nextscale

iDataplex:single node (256GB mem)

unpublished data

wheat

1,468,701,119

5,775,799,648

iDataplex-nextscale

iDataplex:single vSMP node (4 TB mem) cerated by ScaleMP

unpublished data

  1. All datasets are pair-end datasets, in which only mouse dataset is strand-specific.iDataplex-nextscale cluster is known as BlueWonder-NextScale, consisting of 360 nodes each with 2 × 12 core Intel Xeon processors (E5-2697v2 2.7GHz) and 64GB RAM making total 8640 cores in total. iDataplex cluster is known as “BlueWonder”, consisting of 512 nodes each with 2 × 8 core Intel SandyBridge processors (2.6 Ghz) making 8192 cores in total. Original Inchworm with sugarbeet dataset was run using a single iDataplex node with 256GB memory. Original Inchworm with wheat dataset was run using a single vSMP node with 4 Tb memory created by ScalewMP software (http://www.scalemp.com) on iDataplex. ScaleMP creates a virtual symmetric multiprocessing (vSMP) node for shared memory by aggregating multiple compute nodes