Skip to main content

Table 2 The performance of different data transformation methods*.

From: Revealing and avoiding bias in semantic similarity scores for protein pairs

Measure Estimated λ** λ = 1 Inverse (λ = -1) Cube-root (λ = 1/3) Square-root (λ = 1/2) Square (λ = 2) Log
Resnik(AVG) 0.878 0 -*** 0.645 0.370 0 -
Lin(AVG) 0.890 0 - 0.659 0.474 0 -
RS(AVG) 0.925 0 - 0.632 0.355 0 -
Jiang(AVG) 0.812 0 0.081 0 0 0 0.002
Resnik(BMA) 0.938 0.661 - 0.025 0.248 0 -
Lin(BMA) 0.940 0.706 - 0.012 0.156 0.002 -
RS(BMA) 0.927 0.650 - 0.004 0.042 0.001 -
Jiang(BMA) 0.010 0 0 0 0 0 0
TO 0 0 0 0 0 0 0
NTO 0.555 0.001 0 0.366 0.478 0 0.009
Dice 0.926 0.014 0 0.384 0.890 0 0.001
Kappa 0.896 0.010 - 0.518 0.866 0 -
GIC 0.552 0 - 0.096 0 0 -
VSM 0.291 0 - 0.006 0 0 -
  1. * The numbers in the table represent the percentages of the scores that fitted normal distributions after data transformation, among all group pairs with different length combinations.
  2. ** λ was estimated by the method described in the Methods section.
  3. *** "-" indicates the transformation method was not suitable for the similarity measure.