Skip to main content

Table 5 Summary of features of the three cluster methods examined

From: Making sense of EST sequences by CLOBBing them

Feature

UniGene

TIGR

CLOBB

Underlying Clustering Method

megaBLAST

WU-BLAST & CAP3

NCBI BLAST

Stringency

Dependent on stage of clustering

Very High

>= 95% identity over

> 40 bp

High

>= 95% identity over 30 bp

Overlap allowed

N/A

< 20 bp

< 10% of sequence length

Those with > 10% of sequence length are allowed if they contain > 10% unassigned bases

Clusters are always contiguous?

No

Yes

Yes

Dealing with potential chimeric clusters

Initial clustering performed with gene sequences – merging of these initial distinct clusters rejected

CAP3 does not include identified chimeric sequences

Definition of type III matches and 'superclusters' prevents chimeric sequences from merging unsuitable clusters.

Continuity (addition of new sequences)

New builds are compared with previous builds

Post processing

Incremental within algorithm

Historical information

Availability of previous builds

Notes showing retirement of clusters

'superclusters' and merge events can be tagged

Portability and adapatibility

Low

Low

High

Ease of retention of manual curation

Medium

Medium

High