Input: Block length L, n bps reference sequence S, m bps query sequence T, miss match error threshold M, gap error threshold G. |
---|
Let B1 = ⌊n/L⌋, B2 = ⌊m/L⌋, E = MAX(M, G); |
1. For offset = 1 to L do |
   1.1 Divide S [offset, n] into L bps blocks, as Soffset = {soffset,1,..., soffset, B1}; |
   1.2 Convert Soffsetto frequency vectors, as ESoffset = {esoffset,1,..., esoffset, B1}; |
2. For offset = 1 to L do |
   2.1 Sequentially choose B2 blocks from ESoffset, and set the start position as p; Using all possible combinations to get B2-E blocks. And combine them as ADDR variable. Set the remaining E blocks as r; |
   2.2 Mapping pair (ADDR,(r, p)) into a hash map M, and chaining possible conflicts; |
   2.3 Iteratively scan ESoffset for next B2 blocks in ESoffset; |
3 First level filtering process |
   3.1 Divide T into L bps blocks, as T = {t1, t2,..., tB2} and convert them to frequency vector as ET = {et1, et2,..., etB2}; |
   3.2 Choose B2-E blocks from ET and combining them as ADDR variable, set the remaining E blocks as t; |
   3.3 Query ADDR in M and pass all returned results as R = {(r1, p1), (r2, p2),...... } to step 4; |
4 For i = 1 to | R| do |
   if ECD(t, ri) < E then record pi; |
Output: All recorded p i from step 4. |