Modeling and mining term association for improving biomedical information retrieval performance

BMC Bioinformatics

Table 1 Performance of baselines

k ₁	b	Indices	Genomics 2007			Genomics 2006		Genomics 2005	Genomics 2004	HARD 2004
			document	passage	passage2	document	passage	document	document	document	passage
0.4	2.0	word	0.1584	0.0675	0.0267	0.2662	0.0532	-	-	-	-
		sentence	0.1368	0.0406	0.0154	0.2378	0.0398	-	-	-	-
		paragraph	0.1086	0.0170	0.0094	0.2036	0.0192	0.1964	0.2952	0.2449	0.2635
		BEST	0.1584	0.0675	0.0267	0.2662	0.0532	0.1964	0.2952	0.2449	0.2635
0.5	1.3	word	0.2108	0.0963	0.0364	0.3140	0.0718	-	-	-	-
		sentence	0.1805	0.0700	0.0350	0.3030	0.0550	-	-	-	-
		paragraph	0.1588	0.0452	0.0333	0.3109	0.0369	0.2602	0.3404	0.2802	0.2985
		BEST	0.2108	0.0963	0.0364	0.3140	0.0718	0.2602	0.3404	0.2802	0.2985
1.0	1.0	word	0.1556	0.0434	0.0328	0.3097	0.0659	-	-	-	-
		sentence	0.1809	0.0758	0.0350	0.2918	0.0521	-	-	-	-
		paragraph	0.1902	0.0893	0.0327	0.2916	0.0337	0.2547	0.3425	0.2522	0.2718
		BEST	0.1902	0.0893	0.0350	0.3097	0.0659	0.2547	0.3425	0.2522	0.2718
1.2	0.75	word	0.1809	0.0780	0.0295	0.3045	0.0651	-	-	-	-
		sentence	0.1987	0.0814	0.0394	0.3202	0.0522	-	-	-	-
		paragraph	0.2013	0.0648	0.0578	0.3381	0.0362	0.2874	0.3584	0.2617	0.2758
		BEST	0.2013	0.0814	0.0578	0.3381	0.0651	0.2874	0.3584	0.2617	0.2758
2.0	0.4	word	0.1953	0.0844	0.0317	0.3152	0.0637	-	-	-	-
		sentence	0.2084	0.0758	0.0401	0.3529	0.0490	-	-	-	-
		paragraph	0.2025	0.0633	0.0641	0.3476	0.0362	0.2779	0.3483	0.2810	0.2895
		BEST	0.2084	0.0844	0.0641	0.3529	0.0637	0.2779	0.3483	0.2810	0.2895

The baseline results are presented: (1) five parameter settings for (k₁, b) at the first and second columns; (2) three different indices, where "word" stands for the word-based index, "sentence" for the sentence-based index and "paragraph" for the paragraph-based index; (3) three evaluation measures as the document-level, the passage-level and the passage2-level; (4) five TREC data sets as the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set; (5) only a paragraph-based index is set up for the TREC 2005 and 2004 Genomics data sets and the TREC 2004 HARD data set, as mentioned in the section of indexing.

ISSN: 1471-2105