Skip to main content

Table 1 Classification results of the baseline models, MLP with NS, and MLP with PLANS using ECFP for the representation of chemical structures

From: Exploration of chemical space with partial labeled noisy student self-training and self-supervised graph embedding

  

Accuracy

Precision

Recall

F1

Cyp450

SVM

56.37 ± 0.93

0.53 ± 0.01

0.81 ± 0.01

0.64 ± 0.01

 

RF

54.44 ± 0.98

0.42 ± 0.01

0.81 ± 0.02

0.55 ± 0.01

 

AdaBoost

52.40 ± 1.01

0.25 ± 0.02

0.89 ± 0.01

0.39 ± 0.02

 

XGBoost

55.13 ± 1.21

0.57 ± 0.01

0.75 ± 0.01

0.65 ± 0.01

 

MLP

51.97 ± 1.12

0.64 ± 0.03

0.72 ± 0.02

0.68 ± 0.02

 

MLP + mixup

54.28 ± 0.79

0.60 ± 0.02

0.73 ± 0.01

0.66 ± 0.01

 

MLP + NS

56.11 ± 1.63

0.64 ± 0.01

0.76 ± 0.02

0.69 ± 0.01

 

MLP + mixup + NS

56.48 ± 1.45

0.60 ± 0.04

0.76 ± 0.02

0.67 ± 0.02

 

MLP + PLANS

58.94 ± 0.96

0.72 ± 0.02

0.78 ± 0.01

0.75 ± 0.01

 

MLP + PLANS + mixup

58.04 ± 0.70

0.69 ± 0.02

0.76 ± 0.01

0.72 ± 0.01

 

MLP + PLANS + balancing

59.02 ± 1.12

0.73 ± 0.03

0.76 ± 0.02

0.75 ± 0.01

 

MLP + PLANS + balancing + mixup

59.25 ± 1.04

0.68 ± 0.03

0.78 ± 0.01

0.73 ± 0.01

  

AP

F1

Tox21

MLP

0.03 ± 0.01

–

 

MLP + mixup

0.12 ± 0.01

0.02 ± 0.01

 

MLP + NS

0.03 ± 0.004

–

 

MLP + mixup + NS

0.13 ± 0.01

0.08 ± 0.03

 

MLP + PLANS

0.14 ± 0.01

0.04 ± 0.01

 

MLP + PLANS + mixup

0.20 ± 0.03

0.25 ± 0.02

 

MLP + PLANS + balancing

0.16 ± 0.02

0.09 ± 0.04

 

MLP + PLANS + balancing + mixup

0.20 ± 0.02

0.20 ± 0.02

  1. The best performance is highlight in bold. The upper part shows the results for the CYP450 datasets and the lower part shows the results for Tox21 dataset. Note that underlined AdaBoost achieves the best recall performance. However, it is heavily affected by data imbalance. The recision and F1 scores of AdaBoost were much lower than other models