Modeling and mining term association for improving biomedical information retrieval performance

BMC Bioinformatics

Table 4 Number k discussion

	n	document	passage	passage2
Genomics 2007	1	0.3012	0.0918	0.1436
	5	0.3349	0.1400	0.1588
	10	0.3438	0.1422	0.1635
	20	0.3438	0.1422	0.1635
	100	0.3438	0.1422	0.1635
Genomics 2006	1	0.3974	0.1401	-
	5	0.4049	0.1445	-
	10	0.4087	0.1467	-
	20	0.4083	0.1466	-
	100	0.4083	0.1466	-
Genomics 2005	1	0.3012	-	-
	5	0.3116	-	-
	10	0.3123	-	-
	20	0.3123	-	-
	100	0.3123	-	-
Genomics 2004	1	0.3470	-	-
	5	0.3555	-	-
	10	0.3584	-	-
	20	0.3584	-	-
	100	0.3584	-	-
HARD 2004	1	0.2015	0.2005	-
	5	0.2223	0.2197	-
	10	0.2250	0.2208	-
	20	0.2248	0.2208	-
	100	0.2248	0.2208	-

The number k is the parameter in the recursive re-ranking algorithm: (1) the empirical study makes a local optimization number k = 10 as the final depth in the final experiments; (2) k stands for the top k term associations weighted by the factor analysis based model; (3) the recursive re-ranking algorithm will re-rank the baselines according to these k terms; (4) the more the results contain terms among these k terms, the higher ranking scores the results obtain; (5) five numbers such as 1, 5, 10, 20, 100, are tested; (6) five original baselines from our five data sets respectively, namely Genomics 2007, Genomics 2006, Genomics 2005, Genomics 2004 and HARD 2004; (7) k affects the performance greatly when k is smaller than 10, while the final performance almost has no change if k becomes larger than 10.

ISSN: 1471-2105