Skip to main content

Table 1 The feature extraction methods used in this paper

From: Protein–protein interaction site prediction by model ensembling with hybrid feature and self-attention

Feature

Abbreviation

Description

One-hot vector

Seq

It composed by 20 types of different amino acids and a 20D one-hot vector is used to encode it

Position-specific scoring matrix

PSSM

It represents the probabilities of 20 amino acids occurring at each position, and the PSI-BLAST algorithm is used to generate it, i.e., we search against the NCBI’s non-redundant sequence database with three iterations and an E-value threshold 0.001

Entropy density

Den

It represents the composition information of the protein sequence and obtained by calculating the information entropy of 20 amino acid residues

Physicochemical properties

PhyChem

It represents the physical and chemical attributes of different amino acid residues and obtained by multivariate statistical analysis of 188 natural amino acid properties

Hydrophilicity and hydrophobicity index

HyIn

A larger hydropathic index means that the residue is more hydrophilic. Conversely, the residues will have higher hydrophobic properties. The hydrophobicity index is the opposite

Pseudo amino acid based on K-nearest neighbors

K-PseAA

It is a new feature combining K-nearest neighbors with the PseAA proposed in this paper. A subsequence is formed by combining the targeted amino acid residue with the residues that are not more than K before and after it. The length of the subsequence is 2K + 1. Then we calculate the PseAA of this subsequence as the K-PseAA feature of the targeted amino acid residue