Skip to main content

Table 4 The number of similar Trinity transcripts between original Inchworm and MapReduce-Inchworm using the mouse RNA-seq data [22]

From: K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

cutoff for transcript similarity (%)

number of similar transcripts

100

47,816

99

57,926

95

64,109

90

67,178

85

69,002

80

70,398

75

71,390

70

72,285

  1. Two sets of transcripts from original Inchworm and MapReduce-Inchworm were compared using BLAT [42]; Transcripts from original Inchworm was used as target and transcripts from MapReduce-Inchworm was used as query for input parameters to BLAT. The perl script blat_top_hit_extractor.pl, included in Trinity pipeline, was used to extract the most top hit for each transcript in query against target. The first column refers to the cutoff of transcript similarity, which was quantified using two similarity score defined as follows: 1) 1 - (query_sequence_size - number_of_matching_bases)/query_sequence_size 2) 1 - (target_sequence_size - number_of_matching_bases)/target_sequence_size. If these two similarity scores between two transcripts from both methods were greater than or equal to the cutoff value, those were considered as similar transcripts. The second column refers to the number of similar transcripts between original and MapReduce-Inchworm according to the cutoff value. Note the total number of transcripts from both methods can be found in Table 3