Skip to main content

Table 5 A typical substring-cluster caused by low complexity sequence

From: Simultaneous identification of long similar substrings in large sets of sequences

CLU

SEQ

POS

identical substring

1

1

2

     t|aaaaaaaaaaaaaaaaaaaa|aaaaat...

1

1

3

    ta|aaaaaaaaaaaaaaaaaaaa|aaaat... 

1

1

4

   taa|aaaaaaaaaaaaaaaaaaaa|aaat...  

1

1

5

  taaa|aaaaaaaaaaaaaaaaaaaa|aat...   

1

1

6

 taaaa|aaaaaaaaaaaaaaaaaaaa|at...    

1

1

7

taaaaa|aaaaaaaaaaaaaaaaaaaa|t...     

  1. The sequence "taaaaaaaaaaaaaaaaaaaaaaaaat" generates the left maximal substring-cluster 1 for match length 20. The common substring is formed by a run of 20 letters "a";