Skip to main content

Freeprotmap: waiting-free prediction method for protein distance map



Protein residue–residue distance maps are used for remote homology detection, protein information estimation, and protein structure research. However, existing prediction approaches are time-consuming, and hundreds of millions of proteins are discovered each year, necessitating the development of a rapid and reliable prediction method for protein residue–residue distances. Moreover, because many proteins lack known homologous sequences, a waiting-free and alignment-free deep learning method is needed.


In this study, we propose a learning framework named FreeProtMap. In terms of protein representation processing, the proposed group pooling in FreeProtMap effectively mitigates issues arising from high-dimensional sparseness in protein representation. In terms of model structure, we have made several careful designs. Firstly, it is designed based on the locality of protein structures and triangular inequality distance constraints to improve prediction accuracy. Secondly, inference speed is improved by using additive attention and lightweight design. Besides, the generalization ability is improved by using bottlenecks and a neural network block named local microformer. As a result, FreeProtMap can predict protein residue–residue distances in tens of milliseconds and has higher precision than the best structure prediction method.


Several groups of comparative experiments and ablation experiments verify the effectiveness of the designs. The results demonstrate that FreeProtMap significantly outperforms other state-of-the-art methods in accurate protein residue–residue distance prediction, which is beneficial for lots of protein research works. It is worth mentioning that we could scan all proteins discovered each year based on FreeProtMap to find structurally similar proteins in a short time because the fact that the structure similarity calculation method based on distance maps is much less time-consuming than algorithms based on 3D structures.

Peer Review reports


The protein distance map is a two-dimensional matrix, where each value represents a residue–residue distance. Its binary form is known as a contact map. Lots of information can be directly obtained from them, such as secondary structure,Footnote 1 motif,Footnote 2 and interaction typesFootnote 3 concerning the kinds of residues. Besides, conserved patterns and structure motifs can be found [1] by analyzing the distance or contact map. Predicted distance or contact maps are widely used in remote homology protein detection [2,3,4], protein information estimate [5, 6], and protein structure research [7,8,9].

Distance maps or contact maps are utilized in remote homology protein discovery because it is rapid to predict and calculate the similarity between them. Although structure similarity measurement [10,11,12,13] and high-precision structure prediction [14, 15] are available, most existing structure alignment and structure prediction tools take substantial time and memory resources [16]. Considering the rapid growth of protein structure databases, a fast method to detect remote homology is needed. Therefore, a series of methods based on protein distance or contact maps have been proposed [2,3,4], which convert distance or contact maps to feature vectors and then use these vectors to calculate similarity.

Besides remote homologous protein detection, predicted distance or contact maps are widely used in protein information estimation, in which the potential structural information is captured by convolution calculations on the protein distance or contact maps, and the graph representation of proteins can also be obtained by using these maps. For example, Qiu et al. [5] integrate sequence, contact map, and GO label to predict protein functions. Chen et al. [6] use molecular docking simulation and graph representations of proteins based on contact maps discover two candidate drugs. Other common research work includes protein solubility prediction [17], key site prediction [18, 19], protein identification [20], and protein disorder region identification [21].

In the study of protein structure, residue–residue distance or contact maps are commonly used as collective variables to describe conformational changes in bio-molecular simulations. For example, Nassar et al. [7] employ residue–residue distances as biasing potentials in enhanced sampling MD simulations. Lubecka and Liwo [22] use residue–residue distances as restraints to improve structure simulations. Besides, the distance or contact maps are also used in protein design to assess the feasibility of producing a folded protein structure from a particular protein sequence [23]. Many protein domain segmentation methods also use contact information to segment protein domains [9, 24], which are based on the principle “as many intra-domain contacts as possible and as few inter-domain contacts as possible”. Distance or contact maps are also indispensable for many structure prediction algorithms. For example, Zheng et al. [25] fold non-homologous proteins by coupling contact maps with I-TASSER assembly simulations.

The widespread application of distance map prediction has attracted extensive attention from researchers. Barger et al. [26] and Rahman et al. [27] develop extended ResNets to predict distance maps; Si and Yan [28] hybridize 1D and 2D convolutions to increase the effective receptive field of the residual network. Madani et al. [29] develop an accurate protein predictor via hybrid generative adversarial neural networks. Rahman et al. [30] use three ResNets to predict the residue–residue distances within three ranges, and use the fourth ResNet to integrate their prediction results. Guo et al. [31] obtain multiple statistics from the multiple sequence alignments(MSAs) and then use them to construct four different feature sets for residue–residue distance prediction. Li et al. [32] train six ResNet models with the same architecture on various data subsets and ensemble them to make predictions. Deepdist [33] trains many models and ensembles them to predict real distance and distance boundaries at the same time, resulting in higher prediction accuracy.

However, most existing distance map prediction methods rely on MSAs but over half of all proteins are orphan proteins in standard sequence databases [34], and other related methods have their drawbacks. For example, most contact mapFootnote 4 prediction techniques [35] for orphan proteins have two limitations: (1) The information provided by contact maps is insufficient [36]; (2) Due to the employment of ensemble learning technology, most of them are time-consuming; yet, activities like remote homologous protein discovery are time-sensitive. Besides, although Alphafold-2 reliably predicts the protein structure with MSAs by using a variety of algorithms and engineering strategies such as the invariant point attention (IPA) module and recycling strategy, and then ESMFold and omegafold extended this prediction technology to orphan proteins by employing protein language models, their execution is time-consuming. To solve these challenges, we aim to develop a method to quickly and accurately predict distance maps for orphan proteins.

Firstly, we design a core model based on the properties of protein structure. Many local structures exist in protein, such as motif and domain, and the distance between any three residues must satisfy the triangle distance inequality constraint. Aiming to use locality and distance constraints, we design a model called R-former based on the triangular attention mechanism [14] and the proposed fast local microformer. Besides, considering the importance of the triangular attention mechanism in protein research, we also explain the triangle attention mechanism based on the residue–residue relationship, mathematical derivation, and feature representation.

Secondly, we propose group pooling to lower the dimensionality of protein representation, which will reduce the method’s computing cost. Transformer models trained with masked protein sequences depict the affinity of residues in a protein [37, 38] and frees us from relying on MSAs, but these representations are high-dimensional and sparse, which increases the computational cost of the prediction algorithm. We propose group pooling to solve this problem.

Thirdly, we provide error prediction and a new dataset to meet the needs of users. We design a deep learning framework named FreeProtMap based on R-former and group pooling to simultaneously output residue–residue distance and error predictions. We also compiled a dataset called dataset_4.05 by gathering recently published proteins(released on 2023.04.05) to help evaluate diverse prediction methods.

The contributions of this work are as follows:

  • Aiming to improve the accuracy of predicted distance maps for orphan proteins, we propose an R-former that combines the triangular attention mechanism with our proposed fast local microformer. The R-former is designed to take into account both the locality and distance constraints in protein structures. Besides, a detailed explanation of the triangular attention mechanism is provided.

  • Aiming to reduce computation cost, we propose group pooling to effectively reduce the dimensionality of protein deep representations.

  • Aiming to meet the needs of users, we design a deep learning framework called FreeProtMap based on R-former and group pooling, which can simultaneously predict residue–residue distances and errors. We also construct a dataset using newly released proteins to evaluate the methods’ effectiveness.

  • The proposed method predicts distance maps with a mean absolute error (MAE) of 2.32Å and a root mean squared error (RMSE) of 3.63Å on novel proteins. It predicts residue–residue distances more accurately than state-of-the-art structure prediction methods (ESMFold) with 3.74‰ inference time of it.

Datasets and metric


We use the dataset curated by Yang et al. [39] as the base training dataset, which consists of 15051 protein sequences.

We adopt three test datasets provided by CASP14 and CASP15 competitions, and name them respectively as CASP_14D, CASP_14F, and CASP_15D datasets. The CASP_14D dataset includes all 15 protein domain fragments from the CASP14 competition. The CASP_14F dataset includes all 35 complete proteins from the CASP14 competition. The CASP_15D dataset includes all 44 protein domain fragments from the CASP15 competition. We could not find a complete proteins dataset in the CASP 15 competition, so we did not provide any related test results.

To evaluate the model’s performance in newly discovered proteins, we construct a new dataset by curating the recently released proteins (released on 2023.04.05) with PDB resolution below 2 Å. The dataset was designated 4.05_release dataset.

To reduce the impact of overfitting on model evaluation, homologous sequences are detected by using BLAST with an E-value cutoff of \(1\times 10^{-3}\), which effectively filters out nearly all potential homologous sequences. We also further filtered out the redundancy with the training dataset and test datasets according to the 25% sequence identity threshold. Finally, there are 14618 proteins in the training set and 90 proteins in the 4.05_release dataset. We train the model with 11000 sequences and use the rest 3618 sequences as a validation dataset.


Following the standard CASP definition [40], the distance between two residues is defined as the distance between their \(C_b\) atoms (\(C_{\alpha }\) for Gly). Many distance map prediction methods have focused only on residue pairs with real distances below 16 Å[33, 36]. However, recent research shows that accurately predicting the distances between residue pairs with real distances up to 36 Å is of great importance for determining the 3D structure [27]. We evaluate distance prediction on residue pairs with real distance within (0, 36 Å ).

The evaluation metrics consist of mean absolute error (MAE), root mean squared error (RMSE), R Squared (\(R^2\)) and mean deviation (\(M^d\)) for the above statistic. Their calculation methods are as follows:

$$\begin{aligned} MAE_{d}&= \frac{\sum _{i=1}^{n}\left| d_{i}-d_{i}^{p}\right| }{n} \end{aligned}$$
$$\begin{aligned} MAE_{p}&= \frac{\sum _{i=1}^{n}\left| AE_{i}-AE_{i}^{p}\right| }{n} \end{aligned}$$
$$\begin{aligned} RMSE&= \sqrt{\frac{\sum _{i=1}^{n} (d_{i}-d_{i}^{p})^2}{n}} \end{aligned}$$
$$\begin{aligned} R^2&= 1- \frac{\sum _{i=1}^{n}(d_i - d_i^p)^2}{\sum _{i=1}^{n}(d_i - \overline{d})^2} \end{aligned}$$
$$\begin{aligned} M^d_s&= \frac{\sum _{i=1}^{n}\left| s_i -\overline{s}\right| }{n} \end{aligned}$$

where \(d_{i}\) denotes the real residue–residue distance, \(d_{i}^{p}\) denotes the predicted residue–residue distance, n is the number of residue pairs with real distance below 36 Å, \(AE_{i}\) denotes absolute error for the predicted residue–residue distance of the ith pair of residues, \(AE_{i}^{p}\) represents the predicted absolute error about ith residue–residue distance, \(\overline{d}\) denotes the average of real residue–residue distances, and s denotes the statistics such as MAE,RMSE, \(R^2\).


Model architecture

The method described in this paper is illustrated in Fig. 1, which receives the protein sequence and outputs the predicted distance map, as well as the predicted absolute error on it. FreeProtMap carries out three processes sequentially: (1) Representation generation. (2) Representation optimization. (3) Regression prediction. More details are as follows:

Fig. 1
figure 1

Illustration of our proposed framework, which consists of three modules: representation generation module, representation optimization module, and regression prediction module. In the representation generation module, ESM-2 generates attention maps of the input protein, which are then downscaled through group pooling to form the input representation (input feature maps). In the representation optimization module, the representation is further optimized by the R-former. The regression prediction module has two branches: one for distance map prediction and one for error prediction

Representation generation

Fig. 2
figure 2

Common dimensionality reduction methods for attention maps of protein models. a Direct use: Make no processing on attention maps. b Truncation: Extract a portion of attention maps. c Sampling: Random sampling on attention maps. d Group pooling: Diminishing respectively the dimensions of attention maps in each feature subspace to maximize the retention of information

During representation generation, we implement the ESM-2 model to generate the primary feature representations and propose a novel group pooling method to extract the informative ones.

The ESM-2 produces high-dimensional sparse attention maps that indicate the relationship of each residue in the feature space [15]. High-dimensional sparsity refers to the phenomenon that a large number of zero values exist in feature maps or feature vectors. It leads to a serious computational cost, while simple dimension reduction methods result in significant information loss (Fig. 2). To solve this problem, we propose a group pooling method. It is founded upon the following principles:

  1. (1)

    The residue–residue relationship representations in attention maps contain much noise, therefore they will be separately dimensionality-reduced to mitigate the interference of these noises.

  2. (2)

    The residue–residue relationship representations potentially contained multiple types of residue–residue relationships. Each component of the representations may correspond to one kind of relationship. Therefore, the representations are divided into multiple sub-representations for processing.

  3. (3)

    Each above sub-representation is represented by its maximum response.

We represent the 3D tensor of attention maps as a set of 1D feature channel \(X = \{X_i\}, i = 1\ldots N\), where \(N = L \times L\) and L denotes the length of the protein sequence. \(X_i = [X_i^1, X_i^2,\ldots ,X_i^j,\ldots ,X_i^M ]\), where \(X_i^j\) is the component of \(X_i\). \(X_i^j(p)\) is the response at a specific channel position p over the set \(\Omega\) of channel positions in the component. Therefore, the feature maps constructed by group pooling are given by:

$$\begin{aligned} f_i^j&= \max X_i^j(p), p \in \Omega \end{aligned}$$
$$\begin{aligned} f_i&= concat\left( f_i^1,f_i^2,\ldots ,f_i^j,\ldots ,f_i^M\right) \end{aligned}$$
$$\begin{aligned} f&= Assemble(f_i) \end{aligned}$$

For more specific details, please refer to Additional file 1: Appendix S1.

The group pooling method has tremendous potential for protein prediction applications. Currently, large-scale pre-trained models are widely used to acquire protein representations for various downstream tasks. However, these representations often suffer from high-dimensionality sparsity. While working on the input stage, exploiting the high-dimensional sparsity is unnecessary, and reducing effectively dimensionality can tremendously decrease computational expenses and alleviate model training challenges.

Representation optimization

The R-former proposed in this paper optimizes input representations for distance map prediction based on the two important properties of protein structure: locality and triangular inequality constraint.

Modeling based on locality. The locality is embodied as follows: (1) Many local structures exist in protein molecules, including \(\alpha\)-helix and \(\beta\)-sheet, as well as larger local structures like domain and motif. (2) In biological molecules such as proteins, the residue–residue distance changes in proteins molecules are continuous due to the presence of covalent bonds and non-covalent interactions. The distance between \(x_i\) and \(x_j\) is close to the distance between \(x_{i \pm 1}\) and \(x_{j \pm 1}\), where \(x_i\) and \(x_j\) denotes the ith and jth residue in the protein.

Besides, when designing a module based on the locality of protein structure, we need to consider the task properties: The patterns in the protein distance map are monotonous, so the module need a strong generalization ability. The local microformer [41] possesses three key features: significantly enhanced generalization ability, significantly enhanced local modeling ability, and lightweight. Therefore, we adopt the local microformer as the basic local modeling module. To fulfill the speed requirements for remote homologous detection, we have improved the attention calculation of the local microformer by implementing additive calculation [42]. These modifications result in the construction of a high-speed local microformer. We name it as a former module, which is illustrated in Fig. 1f.

The former module primarily performs the following four tasks (Fig. 1f):

(1) The input feature map is divided into several area blocks and each block is reshaped into one-dimensional sequences. Each sequence is denoted by \(S_{i}\).

(2) These sequences are then sent into the multiheaded self-attention module. In this module, scaled multihead dot product attention is utilized to catch dependencies. The calculation method is as follows:

(2.1) Query, key, and value vector are retrieved by using three mapping modules \(W^Q\), \(W^K\), and \(W^V\):

$$\begin{aligned} q_{i}=S_{i}W^Q; k_{i}=S_{i}W^K; v_{i}=S_{i}W^V \end{aligned}$$

(2.2) Query, key, and value vector are divided into H groups:

$$\begin{aligned} & q_{i}^{0}, q_{i}^{1},\ldots ,q_{i}^{H-1}=split(q_{i}) \end{aligned}$$
$$\begin{aligned} & k_{i}^{0},k_{i}^{1},\ldots ,k_{i}^{H-1}=split(k_{i}) \end{aligned}$$
$$\begin{aligned} & v_{i}^{0},v_{i}^{1},\ldots ,v_{i}^{H-1}=split(v_{i}) \end{aligned}$$

(2.3) Perform the following operations on \(q_{i}^{m}\), \(k_{i}^{m}\) and \(v_{i}^{m}\).

(2.3.1) The query vectors are summarized into a global query vector by using additive attention:

$$\begin{aligned} \alpha _ {i} = \frac{exp(w_ {q}^ {T}q_{i}^{m}/\sqrt{d^h})}{\sum _ {i=1}^ {n}exp(w_ {q}^ {T}q_{j}^{m}/\sqrt{d^h})} \end{aligned}$$

where \(w_q \in R^d\) is a learnable parameter vector and \(d^h\) denotes the dimension of the \(q_{j}^{m}\).

(2.3.2) The global attention query vector is computed as follows:

$$\begin{aligned} q^{m}= \sum _ {i=1}^ {N} \alpha _ {i} q_ {i}^{m} \end{aligned}$$

(2.3.3) The correlation of each pixel is calculated through the query vector and key vector within the group and the results are normalized:

$$\begin{aligned} p_{i}^{m} = q^{m}*k_{i}^{m}/\sqrt{d^h} \end{aligned}$$

(2.3.4) The additive attention weight of its ith key vector is computed as follows:

$$\begin{aligned} \beta _ {i} = \frac{exp(w_ {k}^ {1}p_{i}^{m}/\sqrt{d^h}}{\sum _ {i=1}^ {l-1}exp(w^ {T}_ {k}p_ {j}^{m}/\sqrt{d^h}} \end{aligned}$$

where \(w_k \in R^d\) is a learnable parameter vector.

(2.3.5) The global key vector \(k \in R^d\) is further computed as follows:

$$\begin{aligned} k^{m}= \sum _ {i=1}^ {N} \beta _ {i} p_ {i}^{m} \end{aligned}$$

(2.3.6) The weighted matching is performed on the value vector:

$$\begin{aligned} u_ {i}^{m} = k^{m} * v_ {i}^{m}. \end{aligned}$$

(2.4) The outputs of the multihead attention module are rearranged as follows:

$$\begin{aligned} y_{i}=concat \left[ u_{i}^{0},\ldots , u_{i}^{H-1}\right] \end{aligned}$$

(2.5) Transposed convolution is utilized to aggregate the different heads’ attention results contained in the outputs of the multihead attention module. Then the area blocks are merged into a feature map, and finally the feature map is blended with the original feature map by using a residual connection.

Modeling based on distance constraints. The distances between any three residues must satisfy the triangle inequality [14]. We reveal the principles of the triangular attention module in Alphafold-2 [14], which indicates this module can effectively introduce triangular inequality constraints in information modeling. The triangular attention module is illustrated in Additional file 1: Appendix S2. We believe that the triangular attention module works based on the following principle:

  1. (1)

    The residue–residue relationship and the residue–residue distance are closely related, as residues closer in space tend to undergo mutation together to achieve new stable states in which their physical and chemical states are coordinated. Besides, the attention maps that represent residual-residue relationships exhibit specific local patterns after processing, indicating a close correlation between residual-residue relationship representation and residual-residue distance.

  2. (2)

    According to the mathematical interpretation of vector dot product, \(q_{(i,j)}k_{(i,k)}\) in the triangular attention mechanism can be transformed into \(|q_{(i,j)} ||k_{(i,k)} |\cos \theta _{i}\).

  3. (3)

    Since the feature vector in the triangular attention module represents the residue–residue relationship and it is closed to distance relationship, \(|q_{(i,j)} ||k_{(i,k)} |\cos \theta _{i} + b_{(j,k)}\) can be approximated as

    $$\begin{aligned} dist_{(i,j)} \times dist_{(i,k)} \times \cos \theta _{i} + dist_{(j,k)} \end{aligned}$$
  4. (4)

    According to the cosine law, Eq. 20 can be transformed into

    $$\begin{aligned} \frac{{dist_{(i,j)}}^2+{dist_{(i,k)}}^2-{dist_{(j,k)}}^2}{2}+dist_{(j,k)} \end{aligned}$$
  5. (5)

    Sum up,

    $$\begin{aligned}&softmax(q_{(i,j)} k_{(i,k)}+b_{(j,k)}) \\&\quad \approx softmax \left( \frac{{dist_{(i,j)}}^2+{dist_{(i,k)}}^2-{dist_{(j,k)}}^2}{2}+dist_{(j,k)} \right) \end{aligned}$$

The above analysis shows that in a triangle with three residues as vertices, when modeling the distance relationship between residues: If the sum of the two sides is much less than the third side, the resulting attention weight will be very low after applying softmax, which suppresses the propagation of incorrect distance relationships. This module only uses the residual-residual distance relationship, which satisfies the triangle distance constraint, to update other residue–residue distance relationships. Therefore, this module successfully introduces the triangle inequality constraint when modeling the distance relationship.

To further clarify these principles, we offer Fig. 3 to show them.

Fig. 3
figure 3

Illustration of the triangular attention mechanism. In the triangular attention mechanism, the feature vector represents the residue–residue relationship, which is mostly a distance relationship. Based on the mathematical definitions of vector dot product and cosine theorem, it can be understood how this module incorporates the constraints of triangle distance inequality into the modeling

Design of R-former. Based on these two modules, we carefully designed a hybrid structure called R-formers, which combines transformer and Convolutional Neural Networks (CNNs). The basic architecture of R-formers consists of a bottleneck structure and residual structures. The bottleneck structure aims to acquire a robust and high-dimensional representation of inputs and minimizing the risk of overfitting [43, 44]. The residual structure is designed to make the model easier to learn [45], as the neural network block only needs to learn a small residual. The triangular attention module is set on the second layer of the R-former to balance computational costs and prediction accuracy. The remaining hyperparameters for R-formers are chosen based on personal experience.

The R-former performs the following processes (Fig. 1a):

$$\begin{aligned} x_1&= conv^{f}(x_0) \end{aligned}$$
$$\begin{aligned} x_2&= tri\_attn(x_1) \end{aligned}$$
$$\begin{aligned} x_3&= former^{f}(x_2) \end{aligned}$$
$$\begin{aligned} x_4&= former^{m}(x_3,x_2) \end{aligned}$$
$$\begin{aligned} x_5&= conv^{m}(x_4,x_0) \end{aligned}$$

where \(conv^{f}\) denotes the forward CNN (forward conv), \(tri\_attn\) denotes the triangular attention module, \(former^{f}\) denotes the forward former, \(former^{m}\) denotes the mix former, \(conv^{m}\) denotes the mix CNN (mix conv), and \(x_0\) denotes the input representation generated in the first stage.

The \(conv^{f}\) consists of a max pooling layer and a double layer CNN (Fig. 1b). The \(conv^{m}\) consists of a transposed CNN, a padding operation, a fusion operation (Add), and a double-layer CNN (Fig. 1e). The \(former^{f}\) and \(former^{m}\) add the former module on the basic of \(conv^{f}\) and \(conv^{m}\) (Fig. 1c, d), respectively. The window sizes of all max pooling layers in Fig. 1 are 2. The filter number of convolutions are {64,64}, {128, 128},{64,64},{36,36} in double layer CNN of \(conv^{f}\), \(former^{f}\), \(former^{m}\) and \(conv^{m}\), respectively. Their sizes are all \(3\times 3\). The filter number and size of transposed convolution in \(former^{m}\) and \(conv^{m}\) are 1 and \(2\times 2\), respectively.

Regression prediction

We perform regression predictions for the distance map and error. This prediction process consists of two stages. Initially, the regression prediction layers output the predicted values related to residue–residue distance and absolute error (AE), which vary from 0 to 1. The regression prediction layers consist of a single-layer CNN with a kernel size of 1. In the second stage, the predicted values are denormalized to obtain predicted residue–residue distances and predicted absolute error. Specifically, the predicted values in the first stage are magnified N times to be transformed into the actual value. Considering that the real residue–residue distances range between 0 and 100, we set N to 100.

Loss function

Our training strategy consists of two stages: First, we train the R-former and distance map prediction branch, and then we fix the R-former and distance map prediction branch before training the error prediction branch. We adopt a small loss strategy, calculating loss exclusively for residue pairings with actual distances under 36Å.

As MAE is more robust to outliers, it is the preferred loss function for our tasks with a wide numerical range and potential outliers, which is the mean of absolute differences between the predicted and real values:

$$\begin{aligned} \mathcal {L}=\frac{\sum _{i=1}^{n}\left| y_{i}-y_{i}^{p}\right| }{n} \end{aligned}$$

where \(y_{i}\) denotes the true residue–residue distance or absolute error, \(y_{i}^{p}\) denotes the predicted residue–residue distance or absolute error, and n is the number of residue pairs with actual distances under 36 Å.


We use the Adam optimizer with a weight decay of 0.01 to optimize the parameters for 30 epochs in the first stage and 5 epochs in the second stage. The initial learning rate is set to \(1e^{-3}\). The batch size is 1. Our method is implemented on the PyTorch platform and trained with one Nvidia-A100 GPU.

Results and discussions

We report the performance of FreeProtMap on distance map prediction tasks, along with the analysis of the model and results. Additionally, we report its performance on contact map prediction tasks.

Comparison with other methods

Comparison with other methods on prediction accuracy

Due to the unavailability of MSAs-free distance prediction methods, we conduct a comparative analysis of FreeProtMap and state-of-the-art structure prediction methods, in which the predicted 3D structures are applied to generate the distance maps. Table 1 summarizes the experimental results with italic and bold highlighting the best results and the second-best results, respectively.

The results in Table 1 show that FreeProtMap greatly outperforms the best-published method on the test datasets, which verifies that our method is quite successful. More specifically, the proposed method achieves 2.32 Å in \(MAE_{d}\), 3.63 Å in RMSE, and 0.88 in \(R^2\) on the 4.05_release dataset, which outperforms the best-published method by 0.25 Å in \(MAE_{d}\), 1.77 Å in RMSE, 0.21 in \(R^2\). Furthermore, the mean deviation of each statistic shows that FreeProtMap exhibits relatively stable performance on new proteins. More specifically, the proposed method achieves 0.55 Å in \(M^d_{MAE}\), 0.87 Å in \(M^d_{RMSE}\) and 0.06 in \(M^d_{R^2}\) on the 4.05_release dataset, which outperforms the best-published method by 1.69 Å in \(M^d_{MAE}\), 2.63 Å in \(M^d_{RMSE}\), and 0.14 in \(M^d_{R^2}\). Besides, FreeProtMap can predict errors with an MAE of 2.45 Å.

We also evaluate the proposed FreeProtMap and compared methods on the CASP_15D dataset. Table 1 shows that FreeProtMap significantly outperforms the best-published method on this test dataset. More specifically, the proposed method achieves 2.50 Å in \(MAE_{d}\), 3.83 Å in RMSE, 1.26 Å in \(M^d_{MAE}\), 2.11 Å in \(M^d_{RMSE}\) and 0.18 in \(M^d_{R^2}\) on the CASP_15D dataset, which outperforms the best-published method by 0.61 Å in \(MAE_{d}\), 1.10 Å in RMSE, 0.88 Å in \(M^d_{MAE}\), 1.50 Å in \(M^d_{RMSE}\) and 0.03 in \(M^d_{R^2}\).

The exceptional performance of FreeProtMap can be attributed to five primary factors:

  1. (1)

    The attention maps after group pooling better characterize the information on protein residues.

  2. (2)

    The distance map has evident local patterns, and the former module is applied to enhance the local information modeling.

  3. (3)

    FreeProtMap benefits from the AlphaFold-2’s triangle attention module, which exploits the triangular constraint in distance maps.

  4. (4)

    The bottleneck structure of the R-former helps reduce overfitting [43, 44]. R-former faces a significantly lighter overparameterization than ESMFold. These two changes increase FreeProtMap’s performance on novel proteins.

  5. (5)

    Small loss strategies are employed during training to mitigate the influence of problematic data.

The proposed method does not achieve perfect accuracy maybe because there is a deviation in the experimentally measured PDB for the following reasons: (1) The static structures are determined under non-physiological conditions; (2) Different crystallization situations, different structure analysis technologies (NMR, X-ray, cryo-EM, etc.) and even different structure computation methods may lead to structure variation.

Table 1 Comparison of proposed approach with state-of-the-art methods for residue–residue distance prediction

Comparison with ESMFold on computational complexity

Since ESMFold achieves suboptimal performance on most datasets in the comparative experiments, we will further compare ESMFold with FreeProMap in terms of computational complexity.

We report the average time and max space cost of FreeProtMap and ESMFold on a local server in Table 2. FreeProtMap generates a distance map in 0.0295 s on average, with an input protein sequence length of 376.6 AA. Its inference time is just 3.74‰ that of ESMFold. FreeProtMap’s quick inference time is the result of its lightweight architecture and additive attention calculation. ESMFold consists of 56 blocks and requires a recycling step, but FreeProtMap only consists of 5 blocks and skips the recycling step.

Table 2 Complexity of proposed FreeProtMap and ESMFold

Analysis of predicted results

The predicted distance map reflects the structural details

The distance map can clearly display the structural details of the queried protein. In the Fig. 4a, a thick diagonal line in the green box indicates an alpha helix in the protein and a line segment in the blue circle perpendicular to the diagonal line indicates a parallel structure in the protein.

In Fig. 4b, three segments, which are perpendicular or parallel to the diagonals, indicate three protein fragments are parallel to each other. The entire diagonal line in this figure is bold, indicating that the protein is composed of alpha helix.

Based on the principle of “maximum intra-domain contacts and minimum inter-domain contacts”, it can be inferred that the corresponding protein in Fig. 4c is composed of two structural domains.

As a way to demonstrate the generalization and prediction capabilities of FreeProMap on complex proteins, as well as to show more enriched information contained in distance maps, we utilized FreeProtMap to predict the distance map of cas proteins and present the results in Fig. 4d. Cas proteins are used for gene editing, which contains multiple consecutive or non-consecutive domains. When multiple line segments appear in vertical or horizontal directions in the distance map, it indicates that the relevant protein fragments are near together and may form a discontinuous domain.

To sum up, the predicted distance maps effectively contain structural information.

Fig. 4
figure 4

Predicted distance maps and corresponding real 3D structures. a, Two parallel protein fragments are in green circles and highlighted in green. An alpha helix is in a blue box and highlighted in blue. b, Three parallel alpha helixes are in green circles and highlighted in green. c, Two continuous domains are in a purple box and a green box, as well as highlighted in purple and green, respectively. d, A discontinuous domain is in a purple circle and highlighted in purple. It consists of four sub-domains, which are in green, blue, hot pink, and light pink circles, respectively

Analysis of error prediction

We report the distribution of MAE (Mean Absolute Error) between real error and predicted error (Fig. 5a), as well as the distribution of the predicted error (Fig. 5b). In 61.83% of the cases, the MAE of predicted error is below 1.5 Å. In 54.27% of the cases, the predicted error(predicted MAE) is below 1.5 Å. This suggests that FreeProtMap has some but not strong enough ability to pedicte error on predicted residue–residue distance.

Fig. 5
figure 5

a The distribution of mean absolute error (MAE) between real error and predicted error. b The distribution of the predicted error. Note. The adopted statistics for error are MAE. All adopted units are Å

Analysis of the model

The evolution of feature maps

We conduct a deeper analysis of the model by analyzing its feature maps. The Pearson correlation coefficients between the distance map and input r feature maps indicate that the correlation between the distance map and the 2nd, 23rd, 31st, 33rd, and 35th layer input feature maps is relatively high (Fig. 6a). The attribution of the output confirms their relatively high impact on the results (Fig. 6b). However, the depiction of aforesaid feature maps (Fig. 6c) indicates that there is a poor correlation between input feature maps and the distance map, which highlights the significance of developing an R-former to optimize input feature maps.

The mean feature maps produced by each layer of the R-former demonstrate a gradual optimization of feature maps (Fig. 6d).

Fig. 6
figure 6

Analysis on input feature maps. a, The correlation between the distance map and input feature maps. b, Importance of each input feature maps on prediction outcomes in our approach. c, Top 5 input feature maps with the highest correlation to the distance map. d, Optimization process of input feature maps. Note:  ’short protein’ denotes the protein with length within [0,300 AA).  ’long protein’ denotes the the protein with length within [300,+\(\infty\) AA)

Ablation experiment

We evaluate the effectiveness of three key components: the group pooling, the former module, and the triangular attention mechanism. The ablation experimental results are summarized in Table 3 and Fig. 7.

Baseline. We use the FreeProtMap without the former module and triangle attention mechanism, where the input feature maps are obtained by sampling the input feature maps.

P. We use the FreeProtMap model without the former module and triangle attention mechanism, where the input feature maps are group-pooled attention maps.

Tri. We use the FreeProtMap model without the former module, where the input feature maps are obtained by sampling attention maps.

L. We use the FreeProtMap model without the triangle attention mechanism, where the input feature maps are obtained by sampling attention maps.

TriL. We use the FreeProtMap model, where the input feature maps are obtained by sampling attention maps.

TriP. We use the FreeProtMap model without the former module, where the input feature maps are group-pooled attention maps.

LP. We use the FreeProtMap model without the triangle attention mechanism, where the input feature maps are group-pooled attention maps.

FreeProtMap. We use our proposed methods.

Table 3 Results of ablation experiment
Fig. 7
figure 7

Performance of methods in ablation experiment. Note. The units of \(MAE_{d}\) is Å

Table 3 and Fig. 7 show that removing group pooling has the following effects on the model’s performance:

(1) The triangular attention mechanism will lead to a decline in model performance and combining the triangular attention mechanism with the former block will significantly reduce model performance. Without the implement of group pooling to filter the input feature maps, the increased complexity of the model would ultimately undermine its performance.

(2) The former block will neither enhance nor decrease the model’s performance. On the one hand, the former block’s architecture, which improves generalization ability, keeps the model from performing worse as complexity increases. On the other hand, because the input feature maps are not adequately filtered, the former block’s role is not realized, hence adding it has no influence on model performance.

The utilization of group pooling alone significantly improved the model’s performance, primarily by mitigating the challenges associated with high-dimensional sparse data. Additionally, combining it with the former block can further improve the model’s performance. The combination of the triangle attention mechanism, the former block, and the group pooling technique make the model’s performance reach the best.

Comparison of R-former and group pooling with similar methods

Comparison between group pooling and similar methods

To evaluate the power of the group pooling technique in the FreeProtMap, we compare the group pooling with the conventional dimensionality reduction method such as sampling and truncation.

FreeProtMap_T. It is identical to FreeProtMap except it employs truncation as the dimensionality reduction method.

FreeProtMap_S. It is identical to FreeProtMap except it employs sampling as the dimensionality reduction method.

Table 4 Comparison of dimensionality reduction methods

Group pooling significantly outperforms the other two methods (Table 4), because the truncation only extract features in a feature subspace and although the randomly sampled features span multiple feature subspaces, they are not complete.

Comparison between R-former and similar methods

To evaluate the power of the R-former module in the FreeProtMap, we compare the R-former module with several conventional networks. The distance map prediction task and the semantic segmentation task both perform regression prediction for each pixel on maps, so we choose one of the most classical segmentation networks, U-Net [50], and one of the most advanced segmentation networks, UCTrans [51], as the compared models. In addition, we also choose some simple regression models to compare, such as naive single- and multi-layer CNNs.

Fig. 8
figure 8

Comparison between proposed R-former and other models. a, Experimental result of compared models on the 4.05_release dataset. b, Experimental result of compared models on CASP_14D dataset. c, Experimental result of compared models on CASP_14F dataset. d, Experimental result of compared models on CASP_15D dataset. Note: F(U-Net), F(UCTrans) and F(LR-L) denote the proposed method, FreeProtMap, implemented by using U-Net, UCTrans and L-layer CNN as the skeleton network instead of the R-former, respectively. F(R-former) denote the proposed method, FreeProtMap, implemented by using R-former as the skeleton network. LR-L denotes L-layer CNNs with the kernel sizes of \(1 \times 1\). The units of \(MAE_{d}\) and RMSE are all Å

The results show that R-former significantly outperforms other models (Fig. 8), which indicates that our model is highly effective. More specifically, the R-former outperforms the best-compared model by 1.09 Å and 1.80 Å in \(MAE_{d}\) and RMSE on the 4.05_release dataset, respectively. R-former also outperforms the best-compared model by 1.57 Å and 2.85 Å in \(MAE_{d}\) and RMSE on the CASP_14D dataset, respectively. Besides, the R-former also outperforms the best-compared model by 1.20 Å and 2.10 Å in \(MAE_{d}\) and RMSE on the CASP_14F dataset, respectively. R-former also outperforms the best-compared model by 0.90 Å and 1.59 Å in \(MAE_{d}\) and RMSE on the CASP_15D dataset, respectively.

U-Net-like networks exceed simple models because they are more suitable to semantic-segmentation-like tasks. R-former surpasses U-Net and UCTrans for the following reasons:

  1. (1)

    R-former facilitates richer modeling of local information, and locality is an important characteristic of distance maps. Moreover, the former block embedded within the R-former exhibits enhanced generalization capacity [41].

  2. (2)

    The triangle attention module can enhance information modeling quality by imposing triangle distance inequality constraints.

  3. (3)

    Although UCTrans provides a richer and more efficacious way to combine information, its less generalization capacity leads to inferior results.

Performance in contact map prediction

Evaluation metric

Following the standard CASP definition [40], protein residues are considered to be in contact when the inter-residue distance is less than 8.0 \(\text{\AA }\) between two \(C_{\beta }\) atoms (\(C_{\alpha }\) for Gly). To further evaluate the performance of the proposed FreeProtMap, we convert the generated distance map into the contact map based on this threshold and compare FreeProtMap with other state-of-the-art methods for residue–residue contact prediction.

We adopt the commonly used evaluation criteria Top L/n and other criteria for classification tasks such as the Area Under the Receiver Operating Characteristic curve (AUROC), the Area Under the Precision-Recall curve (AUPR), and the F1-score(F1), because contact map prediction is a classification task.

Method comparison

We compare our method with existing advanced contact map prediction techniques (Tables 5, 6). The best results and the second-best results are highlighted in italic and bold, respectively. The results demonstrate that FreeProtMap significantly outperforms the best-published method on the test datasets, which verifies that our method is highly effective.

Table 5 The Top L/k precision of our method and compared methods
Table 6 Comparison of proposed approach with state-of-the-art methods for contact map prediction

To comprehensively evaluate the performance of FreeProtMap, we generate individual receiver operating characteristic curve (ROC) and precision-recall curve (PR) curves for each protein, as well as mean ROC and PR curves for all proteins in each dataset (Figs. 9, 10). FreeProtMap produces favorable ROC curves for almost all tested proteins. FreeProtMap also produces favorable PR curves for 93.33%, 86.67%, 85.71% and 85.64% of the tested proteins in the 4.05_release dataset, CASP_14D dataset, CASP_14F dataset, CASP_14F dataset, respectively.

Fig. 9
figure 9

ROC curves of our method. a, ROC curves of FreeProtMap on the 4.05_release dataset. b, ROC curves of FreeProtMap on CASP_14D dataset. c, ROC curves of FreeProtMap on CASP_14F dataset. d, ROC curves of FreeProtMap on CASP_15D dataset. Note. ROC curves for predicted contact map of each protein are indicated in green. The mean ROC curves are indicated in red

Fig. 10
figure 10

PR curves of our method. a, PR curves of FreeProtMap on the 4.05_release dataset. b, PR curves of FreeProtMap on CASP_14D dataset. c, PR curves of FreeProtMap on CASP_14F dataset. d, PR curves of FreeProtMap on CASP_15D dataset. Note. PR curves for predicted contact map of each protein are indicated in green. The mean PR curves are indicated in red

Application prospect: remote homology protein full-scale search

The advancement of high-throughput sequencing technology has led to exponential growth in protein sequence data. Specially, hundreds of millions of proteins are discovered each year. However, due to the time-intensive process of protein structure prediction and comparison, it is difficult to conduct remote homology protein full-scale search based on their 3D structures. A common and simple solution is to construct a subset of candidate proteins with similar sequences to the reference protein and then perform remote homology protein searches within this subset by using structure prediction and comparison methods. As a result, researchers may overlook proteins that have similar structures but significantly different sequences with reference proteins, and finally it is difficult to discover new target proteins to bypass patent protection and reduce production costs.

However, we can now run a full-scale search for protein remote homology detection by using FreeProtMap for the following reasons: (1) The FreeProtMap takes \(\frac{1}{400}\) the time required by ESMFold to predict distance map and exhibits higher prediction accuracy than it. (2) The distance maps possess nice properties such as rotation and translation invariance, as well as convenient comparison. We can obtain structure similarity based on the predicted distance maps.

After obtaining candidate proteins based on structure similarity by using distance maps, structure prediction models such as alphafold-2 can be used to predict the 3D structure of proteins to further analyze and select candidate proteins.

Protein structure similarity calculation

To validate the feasibility of a protein full-scale search, we will provide specific examples and experimental results. We can obtain protein structure similarity based on distance maps by employing image comparison or distance map comparison methods. In this section, we have chosen the structure similarity index measure(SSIM) [52] to obtain protein structure similarity.

We selected three proteins with similar structures and two proteins with dissimilar structures for a reference protein as test cases, and reported the experimental results in Table 7, where the TM-score and SSIM values are calculated based on the 3D structures and distance maps, respectively. Proteins with similar structures, despite length variations, had higher TM-score and SSIM values. Conversely, proteins with dissimilar structures, although having identical lengths, have lower TM-score and SSIM values.

Table 7 Comparison of protein similarity calculation methods

Assuming the average length of the query sequences is 377 amino acids (AA), it takes 46 days to generate the distance maps for one million of proteins by using an A100 GPU. It takes extra 23 days to compare the reference protein with one million of proteins based on the distance maps. If 8 A100 GPUs are employed in this task, we can scan hundreds of millions of proteins discovered each year to find structurally similar proteins in a month. However, if we perform a full-scale search by predicting and comparing 3D structures, this task will take 16 years with the same equipment.


We offer FreeProtMap to make quick and accurate predictions. The proposed group pooling in FreeProtMap effectively mitigates issues arising from high-dimensional sparseness in protein representation. The proposed R-former in FreeProtMap enhances local information modeling and distance constraint relationship modeling. We will further speed up the FreeProtMap by using flash attention and expand this work to distance distribution prediction in order to assist in the molecular dynamics simulation. We also intend to enhance the performance of FreeProtMap by utilizing techniques like invariant risk minimization and error-aware loss to address sample imbalance.

Availability of data and materials

All data needed to evaluate the conclusions are present in the paper. The additional data and code related to this paper can be downloaded from


  1. Helices can be identified by thickening of the diagonal line on the distance map, while parallel and antiparallel \(\beta\)-folds can be characterized by lines parallel or orthogonal to the diagonal line of the distance map, respectively.

  2. Two or more secondary structural units are connected by a connecting polypeptide (loop) to form further a local spatial structure with a special geometric arrangement.

  3. Hydrophobic, H-bond, salt bridge and so on.

  4. The binary form of distance map.


  1. Emerson IA, Amala A. Protein contact maps: a binary depiction of protein 3d structures. Phys A: Stat Mech Appl. 2017;465:782–91.

    Article  CAS  Google Scholar 

  2. Bedoya O, Tischer I. Reducing dimensionality in remote homology detection using predicted contact maps. Comput Biol Med. 2015;59:64–72.

    Article  PubMed  Google Scholar 

  3. Faisal FE, Newaz K, Chaney JL, Li J, Emrich SJ, Clark PL, Milenković T. GRAFENE: graphlet-based alignment-free network approach integrates 3d structural and sequence (residue order) data to improve protein structural comparison. Sci Rep. 2017;7(1):14890.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Hočevar T, Demšar J. A combinatorial approach to graphlet counting. Bioinformatics. 2014;30(4):559–65.

    Article  PubMed  Google Scholar 

  5. Qiu X-Y, Hao W, Shao J. TALE-cmap: Protein function prediction based on a TALE-based architecture and the structure information from contact map. Comput Biol Med. 2022;149: 105938.

    Article  PubMed  Google Scholar 

  6. Chen G, He H, Zhao L, Lv Q, Chen CY-C. GINCM-DTA: A graph isomorphic network with protein contact map representation for potential use against Covid-19 and omicron subvariants bq. 1, bq. 1.1, xbb. 1.5, xbb. 1.16. Expert Syst Appl. 2023;236:121274.

    Article  Google Scholar 

  7. Nassar R, Brini E, Parui S, Liu C, Dignon GL, Dill KA. Accelerating protein folding molecular dynamics using inter-residue distances from machine learning servers. J Chem Theory Comput. 2022;18(3):1929–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kandathil SM, Greener JG, Lau AM, Jones DT. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proc Natl Acad Sci. 2022;119(4):e2113348119.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhu K, Hong S, Peng Z, Yang J. A unified approach to protein domain parsing with inter-residue distance matrix. Bioinformatics. 2023;39(2):btad070.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33(7):2302–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Holm L, Kääriäinen S, Wilton C, Plewczynski D. Using Dali for structural comparison of proteins. Curr Protoc Bioinform. 2006;14(1):1–5.

    Article  Google Scholar 

  12. Zhu J, Weng Z. Fast: a novel protein structure alignment algorithm. Proteins: Struct Funct Bioinform. 2005;58(3):618–27.

    Article  CAS  Google Scholar 

  13. Vladimir Gligorijević P, Renfrew D, Kosciolek T, Leman JK, Berenberg D, Vatanen T, Chandler C, Taylor BC, Fisk IM, Vlamakis H, et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12(1):3168.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. Highly accurate protein structure prediction with alphafold. Nature. 2021;596(7873):583–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Wenting L, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, dos Santos A, Costa MF-Z, Sercu T, Candido S, Rives A. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30.

    Article  CAS  PubMed  Google Scholar 

  16. van Kempen M, Kim Stephanie S, Tumescheit C, Mirdita M, Lee J, Gilchrist Cameron LM, Söding J, Steinegger M. Fast and accurate protein structure search with foldseek. Nat Biotechnol. 2023;42:1–4.

    Google Scholar 

  17. Gao J, Zheng S, Yao M, Peikun W. Precise estimation of residue relative solvent accessible area from cα atom distance matrix using a deep learning method. Bioinformatics. 2022;38(1):94–8.

    Article  Google Scholar 

  18. Wenjuan N, Lei D. TSNAPred: predicting type-specific nucleic acid binding residues via an ensemble approach. Brief Bioinform. 2022;23(4):bbac244.

    Article  Google Scholar 

  19. Wang R, Wang Z, Li Z, Lee T-Y. Residue–residue contact can be a potential feature for the prediction of lysine crotonylation sites. Front Genet. 2022;12:2550.

    Article  Google Scholar 

  20. Zhang Z, Zhao Y, Wang J, Guo M. DeepRCI: predicting ATP-binding proteins using the residue-residue contact information. IEEE J Biomed Health Inform. 2021;26:2822–9.

    Article  Google Scholar 

  21. Tang Y-J, Pang Y-H, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics. 2020;36(21):5177–86.

    Article  CAS  Google Scholar 

  22. Lubecka EA, Liwo A. Introduction of a bounded penalty function in contact-assisted simulations of protein structures to omit false restraints. J Comput Chem. 2019;40(25):2164–78.

    Article  CAS  PubMed  Google Scholar 

  23. Anishchenko I, Pellock SJ, Chidyausiku TM, Ramelot TA, Ovchinnikov S, Hao J, Bafna K, Norn C, Kang A, Bera AK, et al. De novo protein design by deep network hallucination. Nature. 2021;600(7889):547–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mahmud S, Guo Z, Quadir F, Liu J, Cheng J. Multi-head attention-based u-nets for predicting protein domain boundaries using 1d sequence features and 2d distance maps. BMC Bioinform. 2022;23(1):283.

    Article  CAS  Google Scholar 

  25. Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Rep Methods. 2021;1(3): 100014.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Barger J, Adhikari B. New labeling methods for deep learning real-valued inter-residue distance prediction. IEEE/ACM Trans Comput Biol Bioinf. 2021;19(6):3586–94.

    Google Scholar 

  27. Julia Rahman MA, Hakim Newton Md, Islam KB, Sattar A. Enhancing protein inter-residue real distance prediction by scrutinising deep learning models. Sci Rep. 2022;12(1):787.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Si Y, Yan C. Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function. Brief Bioinform. 2021;22(6):bbab341.

    Article  PubMed  Google Scholar 

  29. Madani M, Behzadi MM, Song D, Ilies HT, Tarakanova A. Improved inter-residue contact prediction via a hybrid generative model and dynamic loss function. Comput Struct Biotechnol J. 2022;20:6138–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Rahman J, Newton MAH, Hasan MAM, Sattar A. A stacked meta-ensemble for protein inter-residue distance prediction. Comput Biol Med. 2022;148:105824.

    Article  CAS  PubMed  Google Scholar 

  31. Guo Z, Tianqi W, Liu J, Hou J, Cheng J. Improving deep learning-based protein distance prediction in casp14. Bioinformatics. 2021;37(19):3190–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Li J, Jinbo X. Study of real-valued distance prediction for protein structure prediction with deep learning. Bioinformatics. 2021;37(19):3197–203.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Tianqi W, Guo Z, Hou J, Cheng J. Deepdist: real-value inter-residue distance prediction with deep residual convolutional network. BMC Bioinformatics. 2021;22:1–17.

    Google Scholar 

  34. Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, et al. Light into the darkness: Unifying the known and unknown coding sequence space in microbiome analyses. bioRxiv. 2020.

  35. Singh J, Litfin T, Singh J, Paliwal K, Zhou Y. SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model. Bioinformatics. 2022;38(7):1888–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ding W, Gong H. Predicting the real-valued inter-residue distances for proteins. Adv Sci. 2020;7(19):2001314.

    Article  CAS  Google Scholar 

  37. Rao R, Meier J, Sercu T, Ovchinnikov S, Rives A. Transformer protein language models are unsupervised structure learners. In: International conference on learning representations. 2021. pp. 1–15.

  38. Vig J, Madani A, Varshney LR, Xiong C, Rajani N, et al. Bertology meets biology: interpreting attention in protein language models. In: International conference on learning representations. 2020. pp. 1–13.

  39. Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020;117(3):1496–503.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ezkurdia I, Grana O, Izarzugaza JMG, Tress ML. Assessment of domain boundary predictions and the prediction of intramolecular contacts in casp8. Proteins: Struct Funct Bioinform. 2009;77(S9):196–209.

    Article  CAS  Google Scholar 

  41. Huang J, Ge H, Sun L, Hou Y, Wang X. ICMiF: interactive cascade microformers for cross-domain person re-identification. Inf Sci. 2022;617:177–92.

    Article  Google Scholar 

  42. Wu C, Wu F, Qi T, Huang Y, Xie X. Fastformer: additive attention can be all you need. 2021. arXiv:2108.09084.

  43. Shwartz-Ziv R, Tishby N. Opening the black box of deep neural networks via information. In: International conference on learning representations. 2017. pp. 1–19.

  44. Zhang J, Liu T, Tao D. An information-theoretic view for deep learning. In: International conference on learning representations. 2018. pp. 1–25.

  45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. pp. 770–8.

  46. Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B, Ma J, Peng J. High-resolution de novo structure prediction from primary sequence. bioRxiv. 2022.

  47. Fang X, Wang F, Liu L, He J, Lin D, Xiang Y, Zhang X, Wu H, Li H, Song L. Helixfold-single: Msa-free protein structure prediction by using protein language model as an alternative. 2022. arXiv:2207.13921.

  48. Chowdhury R, Bouatta N, Biswas S, Floristean C, Kharkar A, Roy K, Rochereau C, Ahdritz G, Zhang J, Church GM, et al. Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol. 2022;40(11):1617–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Wang W, Peng Z, Yang J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat Comput Sci. 2022;2(12):804–14.

    Article  CAS  PubMed  Google Scholar 

  50. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. 2015. pp. 234–41.

  51. Wang H, Cao P, Wang J, Zaiane OR. UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer. In: Proceedings of the AAAI conference on artificial intelligence. 2022. pp. 2441–9.

  52. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12.

    Article  PubMed  Google Scholar 

Download references


There were no external sources of funding for this work.

Author information

Authors and Affiliations



Jiajian Huang: Investigation, Methodology, Writing—original draft, Writing—review and editing. Jinpeng Li: Methodology, Writing—original draft, Writing—review and editing. Qinchang Chen: Conceptualization, Methodology, Writing—original draft, Validation. Xia Wang: Investigation, Validation, Supervision. Guangyong Chen: Methodology, Writing—review and editing, Supervision. Jin Tang: Project administration, Resources, Supervision.

Corresponding authors

Correspondence to Jiajian Huang, Xia Wang, Guangyong Chen or Jin Tang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: 

Supplementary No. 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Li, J., Chen, Q. et al. Freeprotmap: waiting-free prediction method for protein distance map. BMC Bioinformatics 25, 176 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: