Skip to main content

Table 5 Performance of GSP algorithm

From: Modeling and mining term association for improving biomedical information retrieval performance

 

(k1, b)

Geno 2007

Geno 2006

Geno 2005

Geno 2004

HARD 2004

  

document

passage

passage2

document

passage

document

document

document

passage

GSP

(0.4,2.0)

0.1066

0.0338

0.0149

0.1892

0.0242

0.1867

0.2723

0.2358

0.2639

  

(-1.87%)

(-98.75%)

(-58.28%)

(-7.09%)

(-25.95%)

(-4.96%)

(-7.74%)

(-3.72%)

(-0.15%)

 

(0.5,1.3)

0.149

0.0843

0.0456

0.2855

0.0466

0.2423

0.3165

0.2562

0.3001

  

(-6.18%)

(-86.59%)

(-36.85%)

(-8.17%)

(-26.31%)

(-6.88%)

(-7.01%)

(-8.57%)

(-0.54%)

 

(1.0,1.0)

0.1839

0.0898

0.0357

0.2757

0.0402

0.2385

0.3166

0.2501

0.2842

  

(-3.32%)

(-0.60%)

(-9.21%)

(-5.46%)

(-19.40%)

(-6.36%)

(-7.55%)

(-0.83%)

(-4.56%)

 

(1.2,0.75)

0.1905

0.0714

0.0658

0.3174

0.0404

0.2655

0.3293

0.2589

0.2776

  

(-5.35%)

(-10.11%)

(-13.79%)

(-6.11%)

(-11.65%)

(-7.62%)

(-8.11%)

(-1.07%)

(-0.65%)

 

(2.0,0.4)

0.1931

0.0657

0.0667

0.3203

0.0403

0.2588

0.3206

0.2567

0.2916

  

(-4.62%)

(-3.79%)

(-4.02%)

(-7.85%)

(-11.40%)

(-6.89%)

(-7.96%)

(-8.65%)

(-0.73%)

 

Best

0.1931

0.0898

0.0667

0.3203

0.0466

0.2655

0.3293

0.2589

0.3001

Baselines

Best

0.2108

0.0963

0.0641

0.3529

0.0718

0.2874

0.3584

0.281

0.2985

TA

Best

0.2724

0.1611

0.0762

0.3549

0.101

0.3085

0.3606

0.2845

0.3031

  1. The GSP algorithm is adopted as a comparison to the proposed approach: (1) the candidates of 1 - sequences are all the keywords, the k - sequences candidates are generated on the frequent (k - 1) - sequences, after mapped the GSP algorithm to our research problem; (2) the counts of candidates are simulated as a non-parametric distribution, where the lower bound of the 95% confidence interval is the minimum support value for this GSP algorithm; (3) only the paragraph index under five parameter settings of (k1, b) is considered; (4) the best results of the GSP algorithm are compared with the best of the baselines and the proposed term association approach; (5) "TA" stands for term association; (6) the values in the parentheses are the relative rates of improvement over the original baselines.