Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Protein language models can capture protein quaternary state

Fig. 4

Accuracy of quaternary state prediction by different approaches The prediction is based on transfer of the qs annotation to each sequence in the test set based on A. the closest sequence in the train set (as determined by blast). The 0-predicted column indicates the fraction without any significant blast hit, and consequently no prediction; B. the highest similarity in embedding space in the train set (i.e., cosine similarity between embedded vectors); and C. QUEEN—a deep learning model trained on the embeddings (see Text). The confusion matrix includes the frequency of cells representing predicted vs. actual labels (on x and y-axes, respectively), where a matrix occupying only the diagonal represents full success, while off-diagonal values represent wrong predictions. The balanced accuracy increases from left to right as indicated by the darker diagonal, highlighting improved prediction when moving from sequence, to language model representation, to QUEEN, the MLP model. Results are shown for the test set, based on information learned from an independent training set. For corresponding confusion matrices on a redundant set containing also information of sequence similar proteins, see Additional file 1: Figure S2.

Back to article page