Feature | Abbreviation | Description |
---|---|---|
One-hot vector | Seq | It composed by 20 types of different amino acids and a 20D one-hot vector is used to encode it |
Position-specific scoring matrix | PSSM | It represents the probabilities of 20 amino acids occurring at each position, and the PSI-BLAST algorithm is used to generate it, i.e., we search against the NCBI’s non-redundant sequence database with three iterations and an E-value threshold 0.001 |
Entropy density | Den | It represents the composition information of the protein sequence and obtained by calculating the information entropy of 20 amino acid residues |
Physicochemical properties | PhyChem | It represents the physical and chemical attributes of different amino acid residues and obtained by multivariate statistical analysis of 188 natural amino acid properties |
Hydrophilicity and hydrophobicity index | HyIn | A larger hydropathic index means that the residue is more hydrophilic. Conversely, the residues will have higher hydrophobic properties. The hydrophobicity index is the opposite |
Pseudo amino acid based on K-nearest neighbors | K-PseAA | It is a new feature combining K-nearest neighbors with the PseAA proposed in this paper. A subsequence is formed by combining the targeted amino acid residue with the residues that are not more than K before and after it. The length of the subsequence is 2K + 1. Then we calculate the PseAA of this subsequence as the K-PseAA feature of the targeted amino acid residue |