Skip to main content

Table 4 Average AUC-ROC and AUC-PR scores for 5 repetitions of 5 fold cv. on each of the four data sets. The standard deviations are given in parenthesis

From: A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction

Method

AUC-ROC (std)

AUC-PR (std)

Time (sec)

Enzyme

SIMCOMP

0.863 (0.016)

0.303 (0.027)

413,7 min

Edit

0.833 (0.016)

0.178 (0.004)

6

NLCS

0.837 (0.014)

0.228 (0.013)

4

CLCS

0.834 (0.013)

0.234 (0.019)

331

SMILES-based substring

0.752 (0.006)

0.169 (0.010)

133

SMIfp CBD (34D)

0.846 (0.009)

0.199 (0.008)

1

SMIfp Tanimoto (34D)

0.832 (0.012)

0.191 (0.012)

1

SMIfp CBD (38D)

0.852 (0.009)

0.205 (0.009)

1

SMIfp Tanimoto (38D)

0.844 (0.012)

0.201 (0.006)

1

LINGOsim (q =3)

0.846 (0.013)

0.290 (0.013)

3

LINGOsim (q =4)

0.823 (0.010)

0.294 (0.006)

3

LINGOsim (q =5)

0.819 (0.015)

0.264 (0.013)

3

LINGO-based TF

0.811 (0.017)

0.259 (0.008)

19

LINGO-based TF-IDF

0.822 (0.012)

0.292 (0.031)

47

TF-IDF+SIMCOMP

0.852 (0.010)

0.348 (0.017) a

 

LINGOsim+SIMCOMP

0.852 (0.016)

0.318 (0.019) a

 

Ion Channels

SIMCOMP

0.776 (0.012)

0.224(0.032)

48,7 min

Edit

0.754 (0.013)

0.199 (0.025)

1

NLCS

0.753 (0.007)

0.189 (0.037)

0,9

CLCS

0.755 (0.018)

0.185 (0.028)

47

SMILES-based substring

0.743 (0.004)

0.197 (0.031)

21

SMIfp CBD (34D)

0.717 (0.019)

0.136 (0.036)

0,3

SMIfp Tanimoto (34D)

0.698 (0.015)

0.125 (0.022)

0,3

SMIfp CBD (38D)

0.722 (0.012)

0.137 (0.024)

0,3

SMIfp Tanimoto (38D)

0.699 (0.028)

0.156 (0.028)

0,4

LINGOsim (q =3)

0.737 (0.015)

0.192 (0.046)

0,8

LINGOsim (q =4)

0.737 (0.011)

0.197 (0.037)

1

LINGOsim (q =5)

0.727 (0.009)

0.188 (0.026)

1

LINGO-based TF

0.738 (0.018)

0.204 (0.024)

3

LINGO-based TF-IDF

0.712 (0.014)

0.178 (0.029)

7

TF-IDF+SIMCOMP

0.763 (0.010)

0.234 (0.017)

 

LINGOsim+SIMCOMP

0.773 (0.012)

0.229 (0.018)

 

GPCR

SIMCOMP

0.867 (0.009)

0.307 (0.018)

71,2 min

Edit

0.844 (0.015)

0.248 (0.030)

1

NLCS

0.853 (0.006)

0.247 (0.013)

1

CLCS

0.855 (0.014)

0.279 (0.030)

52

SMILES-based substring

0.782 (0.019)

0.205 (0.032)

21

SMIfp CBD (34D)

0.852 (0.014)

0.209 (0.018)

0,3

SMIfp Tanimoto (34D)

0.847 (0.006)

0.213 (0.016)

0,3

SMIfp Tanimoto (38D)

0.856 (0.009)

0.228 (0.015)

0,3

LINGOsim (q =3)

0.875 (0.003)

0.317 (0.015)

1

LINGOsim (q =4)

0.876 (0.004)

0.333 (0.020) a

1

LINGOsim (q =5)

0.874 (0.006) a

0.337 (0.019) a

1

LINGO-based TF

0.872 (0.004)

0.335 (0.012) a

3

LINGO-based TF-IDF

0.871 (0.007)

0.348 (0.018) a

9

TF-IDF+SIMCOMP

0.885 (0.006) a

0.371 (0.017) a

 

LINGOsim+SIMCOMP

0.879 (0.009) a

0.335 (0.016) a

 

Nuclear Receptors

SIMCOMP

0.856 (0.015)

0.435 (0.008)

2,9 min

Edit

0.828 (0.009)

0.305 (0.029)

0,2

NLCS

0.815 (0.018)

0.302 (0.032)

0,2

CLCS

0.813 (0.037)

0.319 (0.039)

10

SMILES-based substring

0.766 (0.028)

0.335 (0.035)

2

SMIfp CBD (34D)

0.809 (0.026)

0.296 (0.015)

0,1

SMIfp Tanimoto (34D)

0.784 (0.031)

0.281 (0.020)

0,1

SMIfp CBD (38D)

0.815 (0.017)

0.307 (0.024)

0,1

SMIfp Tanimoto (38D)

0.787 (0.030)

0.322 (0.034)

0,1

LINGOsim (q =3)

0.800 (0.013)

0.351 (0.036)

0,2

LINGOsim (q =4)

0.829 (0.013)

0.414 (0.031)

0,2

LINGOsim (q =5)

0.834 (0.013)

0.389 (0.023)

0,2

LINGO-based TF

0.820 (0.013)

0.373 (0.035)

0,4

LINGO-based TF-IDF

0.855 (0.022)

0.418 (0.016)

0,8

TF-IDF+SIMCOMP

0.861 (0.008)

0.436 (0.049)

 

LINGOsim+SIMCOMP

0.840 (0.015)

0.399 (0.031)

 
  1. The best AUC-ROC and AUC-PR results for each data set are indicated in bold. The results that are significantly better than SIMCOMP according to the paired t-test (α = 0.05) are indicated with a. The p-values range between 0.0004 and 0.0329, and they are provided in the Additional file 1: Table S1.