Skip to main content

NeuRank: learning to rank with neural networks for drug–target interaction prediction

Abstract

Background

Experimental verification of a drug discovery process is expensive and time-consuming. Therefore, recently, the demand to more efficiently and effectively identify drug–target interactions (DTIs) has intensified.

Results

We treat the prediction of DTIs as a ranking problem and propose a neural network architecture, NeuRank, to address it. Also, we assume that similar drug compounds are likely to interact with similar target proteins. Thus, in our model, we add drug and target similarities, which are very effective at improving the prediction of DTIs. Then, we develop NeuRank from a point-wise to a pair-wise, and further to list-wise model.

Conclusion

Finally, results from extensive experiments on five public data sets (DrugBank, Enzymes, Ion Channels, G-Protein-Coupled Receptors, and Nuclear Receptors) show that, in identifying DTIs, our models achieve better performance than other state-of-the-art methods.

Peer Review reports

Introduction

In drug discovery, experimental verification of Drug–Target Interactions (DTIs) is so expensive and time-consuming that only a small fraction of DTIs have been verified [1,2,3,4,5,6]. Therefore, there is a great need for an effective and efficient computational method for identifying DTIs.

Recently, with the rapid development of high-throughput techniques, a great deal of drug–target interaction data has been generated [7, 8]. Traditional experimental verification limits the speed at which new drugs can be identified [9,10,11]. To meet the increasing need for rapid and effective drug discovery, machine learning methods have become more and more widely applied to detect potential DTIs from verified DTI information [12,13,14,15,16]. Matrix Factorization (MF) [17], one of the most successful methods in recommender systems [18], has been widely extended to DTI prediction. For example, Cobanoglu et al. [19] adopted Probabilistic Matrix Factorization (PMF) [20] to identify the potential drug–target association between chemicals and targets; Gönen [21] developed MF by adopting chemical and genomic kernels to predict DTI networks; Liu et al. [9] added neighborhood regularization to logistic MF to predict the probability that a drug will interact with a target. However, most existing MF-based methods only considered a linear and shallow relation between a drug and a target, which is insufficient to capture the complicated relationship between them.

Recently, great success has been achieved with deep learning models in Computer Vision (CV) [22, 23], Neural Language Processing (NLP) [24, 25], and recommender systems [26,27,28]. The goal of deep learning models is to capture the higher-order relation between input data by their hidden layers [3, 29, 30]. To overcome the limitation of traditional MF-based methods, many researchers have tried to apply deep learning models to the prediction of DTIs. For example, Wang et al. [31] adopted Restricted Boltzmann Machines (RBM) [32] to predict DTIs; Gao et al. [33] proposed a neural network combined with a two-way attention network to provide biological insights to interpret the drug–target predictions; Altae-Tran et al. [34] integrated Long Short-Term Memory (LSTM) and graph Convolutional Neural Networks (CNN) to obtain meaningful information from a few data points. Compared with MF, deep learning models have a greater ability to capture deep representation from raw input data.

Although many deep learning models have been proposed to predict potential DTIs, little effort has been devoted to explore ranking learning in the prediction of DTIs. To comply with the DTI prediction setting, Peska et al. [35] extended Bayesian Personalized Ranking (BPR) [36], which has shown excellent performance in various learning tasks; Yuan et al. [37] designed a ranking-based ensemble learning method, DrugE–Rank, which is modeled on multiple well-known similarity-based methods to improve prediction performance. But, these methods, based on traditional machine learning methods, such as MF and k-Nearest Neighbor (kNN), are insufficient to capture the drug–target latent structures, for they do not consider any deep interactions between latent features.

Inspired by the good performance of deep learning models in various tasks, to predict DTIs, we designed a neural network architecture, NeuRank, in which, we treat identifying DTIs as a ranking task. Deep learning models are powerful and flexible for learning useful representations. Based on Multilayer Perceptron (MLP) architecture, we extended a new interaction module for drugs and targets to better model their relationship. Then, for better performance, we developed our model from a point-wise to a pair-wise and further to a list-wise method. In the pair-wise method, we assume that the observed DTIs, which have been experimentally verified, are more trustworthy and more important than the unknown ones. Thus, we model the relative ordering from each pair of targets to make predictions, and learn to rank by optimizing a pair-wise loss function to find the correct ranking for all targets. And in the list-wise method, we seek to maximize the top-one probability of targets in the ranking list.

Many works have shown that drugs with similar chemical structures have similar therapeutic functions [38,39,40]. This information is used to enrich latent factors and strengthen the presentation ability of the models. For example, Zheng et al. [38] proposed a model, Multiple Similarities Collaborative Matrix Factorization (MSCMF), which learns low-rank features first and then combines them with weighted similarity matrices over drugs and targets for prediction; Zhang et al. [41] adopted drug feature-based and disease semantic similarities as constraints for drugs and diseases; Laarhoven et al. [42] using chemical similarity and interaction information about known compounds, applied the nearest neighbor algorithm to construct an interaction score for drugs. The methods with similar information are able to make better predictions than other methods without any additional information. Thus, for better build relationships between drug–drug and target–target, a similarity calculation method is used to learn the link between these data.

Our contributions are summarized as follows:

  1. (1)

    We solved the DTI problem by using neural networks with a strong ability to capture non-linearity from raw data and learn deep features from a ranking learning perspective;

  2. (2)

    To better predict DTIs, especially for new drugs and targets, we added drug–drug and target–target similarities to our model;

  3. (3)

    For different applications, we developed three neural networks from point-wise to pair-wise learning and further to list-wise learning.

The rest of the paper is organized as follows: “Related work” section briefly reviews the background and some related work. “Proposed methods” section presents our proposed models in detail. “Experiments” section describes the experimental results for several data sets to show the performance of our models. “Conclusion” section gives the conclusion and provides future directions.

Related work

First, we discuss the problem to be solved and define the notations that are used in the rest of the paper. Then, we introduce two MF-based methods, which are closely related to our model: a traditional one Collaborative Matrix Factorization (CMF), and a pair-wise ranking learning one, BPR.

Problem definition

Given a DTI matrix, \(\varvec{Y} \in {\mathbb {R}}^{n\times m}\), with a set of n drugs, \(\varvec{D}\), and a set of m targets, \(\varvec{T}\), and element, \(y_{dt} \in \left\{ 0,1 \right\}\). If drug, d, has been experimental verified to interact with target, t, then \(y_{dt}=1\); otherwise, \(y_{dt}=0\). \(\varvec{P}\in {\mathbb {R}}^{n\times k}\) and \(\varvec{Q} \in {\mathbb {R}}^{m\times k}\) denote the low-rank latent features of drugs and targets, respectively, where k denotes the number of latent features. \(\varvec{p}_d\) and \(\varvec{q}_t\) denote the latent features of drug, d, and target, t, respectively. The goal of MF for DTIs is to learn \(\varvec{P}\) and \(\varvec{Q}\) to reconstruct \(\varvec{Y}\):

$$\begin{aligned} arg \min \limits _ { \varvec{p},\varvec{q}} \sum _{(d,t) \in \varvec{V}} \left( y_{dt}-\varvec{p}_{d}\varvec{q}_{t}^T \right) ^2 + \lambda \left( \left\| \varvec{P} \right\| ^2_{F} + \left\| \varvec{Q} \right\| ^2_{F} \right) \end{aligned}$$
(1)

where \(\varvec{V}\) denotes the set of interactions that have been experimentally verified; \(\left\| \cdot \right\| ^2_F\) denotes the Frobenius norm; \(\lambda\) denotes a regularization coefficient.

CMF

CMF, proposed in [38], adopts multiple kinds of drug–drug and target–target similarities. The objective function of CMF is defined as follows:

$$\begin{aligned} \begin{aligned} arg \min \limits _ { \varvec{p},\varvec{q}}&\sum _{(d,t) \in \varvec{V}} \left( y_{dt}-\varvec{p}_{d}\varvec{q}_{t}^T \right) ^2 + \lambda \left( \left\| \varvec{P} \right\| ^2_{F} + \left\| \varvec{Q} \right\| ^2_{F} \right) \\&+ \lambda _{d} \left\| \varvec{S}^d - \varvec{PP}^T\right\| ^2_{F} + \lambda _{t} \left\| \varvec{S}^t - \varvec{QQ}^T\right\| ^2_{F} \end{aligned} \end{aligned}$$
(2)

where \(\lambda\), \(\lambda _d\), and \(\lambda _t\) denote regularization coefficients; \(\varvec{S}^d \in {\mathbb {R}}^{n\times n}\) denotes the similarity matrix for drugs, and \(\varvec{S}^t \in {\mathbb {R}}^{m\times m}\) denotes the similarity matrix for targets.

The first term, MF, learns low-rank latent features, \(\varvec{P}\), and, \(\varvec{Q}\), to reconstruct \(\varvec{Y}\); the second term is L2 regularization to prevent the model from over-fitting; the last two terms are regularizations, which minimize the squared error between \(\varvec{S}^d\) and \(\varvec{PP}^T\), and between \(\varvec{S}^t\) and \(\varvec{QQ}^T\). The key idea is that the similarity between drugs or targets should be approximated by the inner product of the corresponding two feature vectors.

BPR

DTIs provide only very few verified instances to train; therefore, it is inherently difficult to uncover the interaction probability between drugs and targets. Instead of directly predicting the absolute probability of DTIs, BPR uses pair-wise ranking loss to model the relative order between observed and unobserved interactions.

Based on BPR, Peska et al. [35] developed the DTI prediction model, which has shown promising power in personalized recommendations. The key idea of BPR is that observed interactions should be ranked higher than unobserved ones [36]. The goal of BPR for DTI predictions is to learn the probability that a drug will interact with a target. BPR aims to maximize the posterior probability that drug, d, interacts with the pair targets of t and i: \(p\left( \varvec{\theta } | t>_d i \right)\), where \(\varvec{\theta }\) is the set of learning parameters. The posterior probability is defined as follows:

$$\begin{aligned} p\left( \varvec{\theta } | t>_d i \right) \propto p\left( t>_d i \right| \varvec{\theta }) \cdot p\left( \varvec{\theta } \right) \end{aligned}$$

Then, the probability that drug, d, interacts with target, t, rather than i is defined as follows:

$$\begin{aligned} \begin{aligned} p\left( t>_d i \right| \varvec{\theta })&=\sigma \left( {\widehat{y}}_{dti} \right) ,\\ {\widehat{y}}_{dti}&={\widehat{y}}_{dt}-{\widehat{y}}_{di}. \end{aligned} \end{aligned}$$
(3)

where \(\sigma (x)=1/\left( 1+exp(-x)\right)\) is the sigmoid function, and \({\widehat{y}}_{dt}\) and \({\widehat{y}}_{di}\) are the predicted scores for targets t and i with drug, d, respectively. \({\widehat{y}}_{dt}\), estimated by MF, linearly combines drug and target features as follows:

$$\begin{aligned} {\widehat{y}}_{dt}=\varvec{p}_{d}\varvec{q}_{t}^T \end{aligned}$$
(4)

where \(\varvec{p}_{d}\) and \(\varvec{q}_{t}\) denote the latent features of drug, d, and target, t, respectively.

Finally, based on Bayesian inference, the objective function of BPR, which minimizes the pair-wise ranking loss for all pair instances, is defined as follows:

$$\begin{aligned} {\mathcal {L}}=-\sum _{\left( d,t,i\right) \in \varvec{F} }ln\sigma \left( {\widehat{y}}_{dt}-{\widehat{y}}_{di}\right) + \lambda \Vert \varvec{\theta } \Vert ^2_F \end{aligned}$$
(5)

where \(\varvec{F}=\left\{ (d,t,i) |d \in \varvec{D} \wedge t \in \varvec{V}_d^+ \wedge i \in \varvec{V}_d^- \right\}\) denotes that drug, d, tends to interact with target, t, rather than i, where, when given a drug, d, \(\varvec{V}_d^+=\{t \in \varvec{T}|y_{dt}=1 \}\) denotes a set of targets that have been experimentally verified to interact with d. \(\varvec{V}_d^-\) is the rest, and \(\lambda\) is the regularization parameter.

Both CMF and BPR are MF-based methods, which are linear in nature. Therefore, when compared to nonlinear methods, they have limited performance [27, 43]. Inspired by the idea from BPR for ranking learning in DTI prediction and the good performance of NeuMF [43] in recommender systems, we developed a neural network to promote DTI prediction in ranking perspective.

Proposed methods

Methods for one-class data, i.e. data with only positive examples, are classified into three categories: point-wise regression, pair-wise, and list-wise methods. Point-wise regression methods directly optimize the absolute value of binary interaction. Pair-wise ranking methods assume that drugs have a higher possibility to interact with verified targets rather than unverified ones. And list-wise ranking methods seek to maximize the top-one probability of targets in the ranking list.

In this section, we build our NeuRank to learn simultaneously the latent features of DTIs and similarity information. First, we introduce in detail the framework of the point-wise method, NeuRank. Then, we develop our model from point-wise to pair-wise learning and further to list-wise learning. The purpose of our models is to predict the probability that a drug will interact with a target from observed DTIs.

Framework

Fig. 1
figure 1

Framework of NeuRank. NeuRank, a point-wise network, consists of the five layers: input, embedding, interaction, hidden, and prediction

Point-wise methods, which consider unobserved interactions to be inherently negative, combine the latent features of drugs and targets to predict the score used to rank. Figure 1 illustrates the network framework of NeuRank, which consists of the following five layers: input, embedding, interaction, hidden, and prediction.

Input and embedding layers The role of the embedding layer is to transfer drug and target IDs from the input layer to latent representation space and map the sparse features to dense features as follows:

$$\begin{aligned} \varvec{p}_d=\varvec{P}^T \varvec{E}_d \end{aligned}$$
(6)
$$\begin{aligned} \varvec{q}_t=\varvec{Q}^T \varvec{E}_t \end{aligned}$$
(7)

where \(\varvec{P}\in {\mathbb {R}}^{n\times k}\) and \(\varvec{Q} \in {\mathbb {R}}^{m\times k}\) denote the embedding matrices for drugs and targets, respectively; d and t denote the one-hot encoding representation of the ID of a drug and a target, respectively, and their embedding vectors \(\varvec{q}_d \in {\mathbb {R}}^{1\times k}\) and \(\varvec{q}_t\in {\mathbb {R}}^{1\times k}\), respectively.

Interaction layer The role of the interaction layer is to model the interactions between drugs and targets in the shallow layer. The interaction layer, which captures the row-rank relations between drugs and targets, is defined as follows:

$$\begin{aligned} \varvec{h}_0=f\left( \varvec{p}_{d},\varvec{q}_{t} \right) \end{aligned}$$
(8)

where \(f(\cdot )\) denotes the interaction functions between \(\varvec{p}_u\) and \(\varvec{q}_i\), such as concatenation, element-wise product, and element-wise sum. We chose element-wise product as our interaction function.

Hidden layers The role of the hidden layers is to learn nonlinear correlations between drugs and targets. Hidden layers provide neural networks a powerful ability to model the high-rank relationships between features as follows:

$$\begin{aligned} \begin{aligned} \varvec{h}_1&=a\left( \varvec{W}_{1}^T\varvec{h}_{0}+\varvec{b}_1 \right) \\&\cdots \\ \varvec{h}_{L}&=a\left( \varvec{W}_{L}^T\varvec{h}_{L-1}+\varvec{b}_L \right) \end{aligned} \end{aligned}$$
(9)

where \(\varvec{W}_l\), \(\varvec{b}_l\), \(\varvec{h}_{l}\) and \(a(\cdot )\) denote weight, bias, output, and activation functions of the l-th (\(0< l\le L\)) layer, respectively. The ReLU function is used as our activation function.

Prediction layer The role of the prediction layer is to compute the probability that a drug will interact with a target. The output, \({\widehat{y}}_{dt }\), is defined as follows:

$$\begin{aligned} {\widehat{y}}_{dt}=\sigma (\varvec{W}_{L+1}^T\varvec{h}_{L}+\varvec{b}_{L+1}) \end{aligned}$$
(10)

where \(\sigma (\cdot )\) denotes the sigmoid function.

In NeuRank, the square loss function is used to evaluate loss and the L2 norm is used to regularize all learning parameters:

$$\begin{aligned} {\mathcal {L}}_1=\sum _{(d,t) \in \varvec{V}}\left( y_{dt} - {\widehat{y}}_{dt} \right) ^2 + \lambda \Omega (\varvec{\Theta }), \end{aligned}$$
(11)

where \(\varvec{\Theta }\) denotes the learning parameter set of NeuRank.

Pair-wise NeuRank

To make predictions, pair-wise methods model the relative ordering from each pair of targets. In contrast to the point-wise method, pair-wise methods assume that observed interactions are more trust worthy than unobserved ones. Then, NeuRank is developed from point-wise to pair-wise learning NeuRank (pNeuRank). Illustrated in Fig. 2 is the network framework of pNeuRank.

Fig. 2
figure 2

Framework of pNeuRank. pNeuRank, a pair-wise method, assumes that observed interactions are more trust worthy than unobserved ones. It consists of the five layers: input, embedding, interaction, hidden, and prediction

In pNeuRank, we assume that an experimentally verified target that interacts with a drug will be assigned a higher value than an unverified target. Thus, the objective function is defined as follows:

$$\begin{aligned} {\mathcal {L}}_2=-\sum _{\left( d,t,i\right) \in \varvec{F} }ln\sigma \left( {\widehat{y}}_{dt}-{\widehat{y}}_{di}\right) + \lambda _p \Omega \left( \varvec{\Theta }_p\right) , \end{aligned}$$
(12)

where \(\varvec{F}=\left\{ (d,t,i) |d \in \varvec{D} \wedge t \in \varvec{V}_d^+ \wedge i \in \varvec{V}_d^ - \right\}\) denotes that drug, d, tends to interact more with target, t, than with i; \(\lambda _p\), \(\lambda _d\) and \(\lambda _t\) are the regularization parameters; and \(\varvec{\Theta }_p\) denotes the learning parameter set of pNeuRank.

In pNeuRank, the first four layers (input, embedding, interaction, and hidden) are the same as in the previous NeuRank framework. The key difference is the final output layer, \({\widehat{y}}_{dti}\), defined as follows:

$$\begin{aligned} {\widehat{y}}_{dti}=\sigma ({\widehat{y}}_{dt}-{\widehat{y}}_{di}) \end{aligned}$$
(13)

where \({\widehat{y}}_{dt}\) is the output of the final hidden layer when given an observed interaction between drug, d, and target, t; \({\widehat{y}}_{di}\) is the output when given an unobserved interaction between drug, d, and target, i; and \(\sigma (\cdot )\) denotes the sigmoid function to bound the gap between the two values.

List-wise NeuRank

Finally, we design a list-wise framework, lNeuRank, to predict the potential DTIs. In lNeuRank, we seek to maximize the top-one probability of targets in the ranking list. The framework is shown in Fig. 3. In Fig. 3, in the list of \(\left( K+1\right)\) targets for training, there are one positive instance, and K negative instances sampled from drug d. \({\varvec{q}}_i^{\_}\), where \(i \in \left[ 1,K\right]\), denotes the embeddings from negative instances.

Fig. 3
figure 3

Framework of lNeuRank. lNeuRank seeks to maximize the top-one probability of targets in the ranking list. It consists of the five layers: input, embedding, interaction, hidden, and prediction

Similarly, in lNeuRank, the first four layers (input, embedding, interaction, and hidden) are the same as in the previous NeuRank framework. The key difference is the final output layer, \({\widehat{y}}_{dt}\), defined as follows:

$$\begin{aligned} {\hat{y}}_{dt}=softmax(x_{dt}), \end{aligned}$$
(14)

where \(x_{dt}\) denotes the output from the final hidden layer. We chose the softmax function to map the results from the hidden layer to prediction. The probability \({\hat{y}}_{dt}\) that target t ranks at the top-one for drug d is defined as follows:

$$\begin{aligned} {\hat{y}}_{dt}=\frac{e^{x_{dt}}}{\sum _{i=1}^{K+1} e^{x_{di}}}. \end{aligned}$$
(15)

Then, loss is evaluated by cross entropy, which used to measure the distribution between the true list and the predicted list from the ranking model, is defined as follows:

$$\begin{aligned} {\mathcal {L}}_3=-\sum _{d=1}^n \left( \sum _{t \in l_d^+} log{\hat{y}}_{dt} + \sum _{i \in l_d^-}log\left( 1-{\hat{y}}_{di}\right) \right) + \lambda _l \Omega \left( \varvec{\Theta }_l\right) , \end{aligned}$$
(16)

where \(l_d^+\) and \(l_d^-\) denote the verified and unverified interaction list of drug d, respectively; and \(\varvec{\Theta }_l\) denotes the learning parameter set of lNeuRank.

Similarity information

Based on the assumption that similar drugs will interact with similar targets, and vice versa, we added drug–drug similarity and target–target similarity networks to our model. The chemical structure similarity between compounds and the sequence similarity between target proteins are critical for improving the prediction of DTIs, especially when few DTIs are available. Therefore, to predict the interaction from new drugs/targets, we added that similarity information to our models. Similarity regularization is defined as follows:

$$\begin{aligned} {\mathcal {L}}_s=\lambda _{d}\Omega ( \theta ^d) + \lambda _{t}\Omega \left( \theta ^t \right) \end{aligned}$$
(17)

where \(\Omega \left( \cdot \right)\) is the function to measure the distance between predicted and true similarities. An function which measures the distance from the true values as shown in the following:

$$\begin{aligned} \Omega ( \theta ^d ) =\left\| \varvec{S}^d - \varvec{PP}^T\right\| ^2_{F}\end{aligned}$$
(18)
$$\begin{aligned} \Omega \left( \theta ^t \right) = \left\| \varvec{S}^t - \varvec{QQ}^T\right\| ^2_{F} \end{aligned}$$
(19)

Finally, the objective function is defined as follows:

$$\begin{aligned} {\mathcal {L}}'={\mathcal {L}}_i+{\mathcal {L}}_s \end{aligned}$$
(20)

where \({\mathcal {L}}_i\) is the loss function of NeuRank Eq. 11, pNeuRank Eq. 12, lNeuRank Eq. 16, respectively.

Sampling for imbalance data

Since only a small fraction of DTIs is verified, which causes the imbalance data problem, i.e. the number of known DTIs is much larger than the number of unknown DTIs. The imbalance data used to train model will lead to poor performance.

To alleviate this problem, negative sampling, an effective method, is used. In general, the negative sample is proportional to the number of positive sample for each drug/target. The negative DTIs are randomly selected from a set of unobserved DTIs with an equal probability.

Experiments

First, we introduce the data sets used in our experiments; then, we present the baselines we used as comparisons with our models and the metrics we adopted for evaluation; finally, we conduct the experiments in detail and make a detailed analysis.

Experimental setting

Data sets We performed experiments on five public data sets: DrugBank, Nuclear Receptors, G-Protein-Coupled Receptors (GPCRs), Ion Channels and Enzymes. The first data set, which contains information on drugs and targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre, is available at DrugBank DatabaseFootnote 1. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information [44]. And the rest data sets, whose observed DTIs were extracted from public databases KEGG BRITE [45], BRENDA [46], SuperTarget [47], and DrugBank [48], are available at: http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/. The drug chemical structure information is retrieved from the KEGG LIGAND [45], and the three-dimentional structure of target protein is retrieved from PDB [49]. Each one contains three types of information: 1) verified DTIs; 2) drug similarities; and 3) target similarities [50]. Table 1 lists some statistics about the verified DTIs in all the data sets.

Table 1 Statistics about data sets

Drug–drug similarities are computed by SIMCOMP [51], which uses a graph method to model the size of the common substructures between two compounds. Target–target similarities are computed by normalized Smith-Waterman [52], which measures the similarity scores between the amino acid sequences of two proteins.

Evaluation metrics Following previous works [1, 9, 35, 38], two popular metrics: Area Under the Precision–Recall (AUPR) and Area Under the Curve (AUC), are used for performance evaluation in the prediction of DTIs. To evaluate our proposed methods, we used 10–fold Cross Validation (CV) and compared it with other baseline approaches. In 10–fold CV, the data set is randomly divided into 10 equal sized subsets. Of the 10 subsets, a single subset is retained as the validation data for testing the model; the remaining 9 subsets are used as training data. CV is then repeated 10 times, with each of the 10 subsets used exactly once as the validation data. The 10 results are then averaged to produce a single estimation. An AUC score is estimated in each repetition of CV; finally, the average score over all five repetitions is determined. The AUPR score is estimated in the same way.

In DTIs tasks, the main purposes are to effectively detect potential DTIs and discover new drugs. Thus, we conducted CV under the following two different settings:

\(CV_{dt}\): CV on drug–target pairs In this case, we randomly chose 90% of the drug–target pairs in \(\varvec{Y}\) as training data and the remaining 10% as testing data;

\(CV_{nd}\): CV on new drugs In this case, we randomly chose 90% of the rows in \(\varvec{Y}\) as training data and the remaining 10% as testing data;

Baseline approaches To illustrate the effectiveness of our models, we compared our models with the following methods:

  • PMF, the probabilistic MF, uses dot products on the latent features of drugs and targets to make predictions [19];

  • CMF, the state-of-the-art MF-based method, models on, not only DTIs, but also drug–drug and target–target similarities [38];

  • BRDTI, the state-of-the-art BPR-based method, extends the BPR method by adding similarity information and target bias [35];

  • RBM, a shallow neural network-based method for DTI prediction, its visible units encode observed types of DTIs, and its hidden units represent latent features describing DTIs [31];

  • DeepDTIs, the state-of-the-art deep learning method, uses Deep Belief Networks (DBN) to predict DTIs, without taking similarity information into consideration [29].

Parameter settings Our models have seven key parameters: latent feature size (k), learning rate (\(\tau\)), the number of hidden layers (l), batch size (b), one regularization parameter for learning parameters (\(\lambda\)), and two regularization parameters for similarity information (\(\lambda _d\) and \(\lambda _t\)). These parameters and factors were determined by grid-search on the validation error. In grid-search, k is chose from \(\{8, 16, 32, 64, 128\}\); \(\tau\) is chose from \(\{10^{-4}, 10^{-3}, 10^{-2}, 10^{-1}\}\); l is chose from \(\{1, 2, 3, 4, 5\}\); b is chose from \(\{64, 128, 256, 512\}\); \(\lambda\), \(\lambda _d\), and \(\lambda _t\) are chose from \(\{10^{-4}, 10^{-3}, 10^{-2}, 10^{-1}, 1\}\). And the Adam optimizer is chose to optimize our objective function.

Table 2 AUC and AUPR values of all methods on five data sets under the setting \(CV_{dt}\)

Results and analysis

Overall performance First of all, some experiments involved investigation to verify the performance of our methods on different data sets. Table 2 shows the AUC and AUPR scores obtained from all the methods under the setting \(CV_{dt}\).

As shown in Table 2, in most cases, performances of all our models are higher compared with the results of other baseline approaches on the same data set. Also, lNeuRank attains the best AUC and AUPR values over the large data sets (DrugBank, Enzymes, and Ion Channels). On DrugBank, Enzymes, and Ion Channels, in terms of AUC, lNeuRank achieves 2.81%, 5.21% and 2.86% higher than the best baseline method, DeepDTIs, respectively; and in terms of AUPR, lNeuRank achieves 0.94%, 1.14% and 0.18% higher than DeepDTIs, respectively. These results indicate that, in the large data sets, when using neural networks, our model makes high quality predictions.

From the results shown in Table 2, we conclude the following: (1) on the large data sets, lNeuRank >pNeuRank >NeuRank, which indicates that large data sets contain sufficient ranking information for our models to learn accurate features; (2) on the two smallest data sets (GPCRs and Nuclear Receptors), our models achieve worse results than DeepDTIs for these two cases, and a common trend in all cases is NeuRank >pNeuRank >lNeuRank. The best possible reason is that both data sets are too small to contain enough information to make a ranking comparison of DTIs; (3) PMF and CMF exhibit inferior performance on all data sets, indicating that the inner product is insufficient to capture the complex relations between drug and target; (4) BRDTI achieves higher AUPR values than CMF, and pNeuRank higher than NeuRank over all data sets, illustrating that adding pair-wise information can boost the performance of the models; (5) on all data sets, RBM has the worst results, indicating that shallow networks without similar information do not make good predictions; (6) NeuRank and pNeuRank capture the nonlinear correlations of latent features via their deep learning strategies; therefore, NeuRank and pNeuRank generally outperform PMF and BRDTI, respectively. Because our models capture the non-liner correlations of the features, they consistently outperform all other baselines. In summary, within the same data set, our methods outperform other competitive approaches, which suggests that the deep learning technique is an effective tool to extract more meaningful features to detect true DTIs.

Effect of similarity information. Next, we study how similarity information benefits the prediction of DTIs under settings, \(CV_{nd}\). In this experiment, we set a same value for both \(\lambda _d\) and \(\lambda _t\). The results obtained under the setting, \(CV_{nd}\), for new drugs is shown in Table 3. The best results are shown in bold.

Table 3 AUC and AUPR values of all methods on five data sets under the setting \(CV_{nd}\)

The results in Table 3 show that our methods, compared with other methods under different settings, yield optimal AUC and AUPR values, indicating that our method, with similarity information, achieves consistently accurate prediction results across all data sets. Compared with the performance in the setting \(CV_{dt}\), after including similarity metrics, our models, BPDTI, and CMF achieve comparable results in the setting \(CV_{nd}\), indicating that adding similarity information to the models is very effective for finding new DTIs. Therefore, it is clearly seen that considering multiple similarities is critical for optimal prediction performance.

To further illustrate the similarity information effects on the prediction of DTIs, we conducted experiments using the DrugBank data sets. In these experiments, we randomly selected one interaction of each drug as testing data and the remainder as training data. Then, we ranked all unobserved DTIs by our trained models. We compared NeuRank with its simplified version without similarity information and selected three examples. The experimental results are shown in Table 4.

Table 4 Example prediction of similarity effect

From Table 5, it is seen that, compared with the simplified version without similarity information, the predictions of NeuRank, in all cases, are always more accurate. Without similarity information, not only does the previous method incorrectly predict a target in the top-4 results in the first case, but also achieves worse results in the other cases. In summary, similarity regularization shows strong improvement over our method.

Effect of hidden layers depth (l). In addition, we studied the impact of hidden layers depth on the prediction of DTIs for our models. In this experiment, the number of hidden layers goes from one to five by step one under the setting, \(CV_{dt}\), on all data sets. Figure 4 shows the performance of AUC and AUPR as the number of depth is changed.

Fig. 4
figure 4

Effect of hidden layer depth of our models on five data sets under the setting \(CV_{dt}\). It shows the performance of AUC and AUPR as the number of hidden layers goes from one to five by step one

Table 5 Effect of embedding size

As seen in Fig. 4, on the large data sets, DrugBank and Enzymes, the performance of NueRank remains stable as depth increases; on the small data sets, Ion Channels, GPCRs and Nuclear Receptors, the performance of NueRank decreases as depth increases. Deep neural networks have a strong ability to express features; however, for the small data sets, too many parameters can easily lead to over-fitting. Therefore, we conclude that a sensible number of hidden layers is indeed helpful for improving the model.

Effect of embedding size (k). Finally, we illustrate the effects different embedding sizes (latent feature sizes) have on prediction under the setting \(CV_{dt}\) in our proposed models. For simplicity, we conducted experiments on two largest data sets: DrugBank and Enzymes, and use AUC to evaluate. In this experiment, the embedding size was selected within the range \(\{8, 16, 32, 64, 128\}\). The effect embedding size has on the performance of our models is shown in Table 4.

As seen from Table 4, our methods achieve best results when \(k=32\). And k increases, there is a clear increasing trend in the AUC values until the maximum is reached at \(k=32\); then, at \(k=64\), there is a slight decrease. Thus, it is seen that an embedding size that is too large causes the model to be over-fitting; an embedding size that is too small causes the model to be under-fitting. Consequently, an appropriate size is important for the model to learn meaningful and accurate features and perform well.

Conclusion

Prediction of DTIs plays an import role in the drug discovery process. We proposed three novel methods, NeuRank, pNeuRank, and lNeuRank, to predict the interaction probability. Our models are neural network architectures, which have a powerful ability to effectively learn nonlinear and deep features for predicting DTIs. In addition, especially for new drugs and targets, some similarity information is added to our models for better performance. Experimental results show that, compared with baseline approaches, our methods achieve better performance and higher quality. What is more, our methods can provide useful hits for further biological study of drug discovery and development.

In future work, first, we plan to integrate more biological information to further improve our models; second, because similarity computation plays a critical role in learning accurate latent features, we plan to explore other nonlinear techniques to combine similarity matrices for drugs and targets; finally, for wider application, we will try to incorporate our models with other deep learning models.

Availability of data and materials

DrugBank Database is available at: http://www.drugbank.ca. Nuclear Receptors, G-Protein-Coupled Receptors (GPCRs), Ion Channels and Enzymes data sets, are available at: http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/.

Notes

  1. http://www.drugbank.ca.

Abbreviations

DTIs:

Drug–target interactions

AUC:

Area Under the receiver operator characteristic curve

AUPR:

Area under the precision-recall curve

References

  1. Ezzat A, Zhao P, Wu M, Li X-L, Kwoh C-K. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Trans Comput Biol Bioinf. 2016;14(3):646–56.

    Article  Google Scholar 

  2. Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform. 2019;93:103159.

    Article  PubMed  Google Scholar 

  3. You J, McLeod RD, Hu P. Predicting drug-target interaction network using deep learning model. Comput Biol Chem. 2019;80:90–101.

    Article  PubMed  CAS  Google Scholar 

  4. Meng F-R, You Z-H, Chen X, Zhou Y, An J-Y. Prediction of drug-target interaction networks from the integration of protein sequences and drug chemical structures. Molecules. 2017;22(7):1119.

    Article  PubMed Central  Google Scholar 

  5. Chen H, Zhang Z. A semi-supervised method for drug-target interaction prediction with consistency in networks. PLoS ONE. 2013;8(5):62975.

    Article  Google Scholar 

  6. Shaikh N, Sharma M, Garg P. An improved approach for predicting drug-target interaction: proteochemometrics to molecular docking. Mol BioSyst. 2016;12(3):1006–14.

    Article  PubMed  CAS  Google Scholar 

  7. Chen B, Li M, Wang J, Shang X, Wu F-X. A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med Genomics. 2015;8(3):1–11.

    Google Scholar 

  8. Volkamer A, Rarey M. Exploiting structural information for drug-target assessment. Future Med Chem. 2014;6(3):319–31.

    Article  PubMed  CAS  Google Scholar 

  9. Liu Y, Wu M, Miao C, Zhao P, Li X-L. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput Biol. 2016;12(2):1004760.

    Article  Google Scholar 

  10. Che J, Chen L, Guo Z-H, Wang S, et al. Drug target group prediction with multiple drug networks. Combin Chem High Throughput Screen. 2020;23(4):274–84.

    Article  CAS  Google Scholar 

  11. Zhou M, Chen Y, Xu R. A drug-side effect context-sensitive network approach for drug target prediction. Bioinformatics. 2019;35(12):2100–7.

    Article  PubMed  CAS  Google Scholar 

  12. Chen R, Liu X, Jin S, Lin J, Liu J. Machine learning for drug-target interaction prediction. Molecules. 2018;23(9):2208.

    Article  PubMed Central  Google Scholar 

  13. Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2013;53–64.

  14. Zong N, Kim H, Ngo V, Harismendy O. Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics. 2017;33(15):2337–44.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Hao M, Bryant SH, Wang Y. Predicting drug-target interactions by dual-network integrated logistic matrix factorization. Sci Rep. 2017;7(1):1–11.

    Article  Google Scholar 

  16. Zhang W, Chen Y, Li D. Drug-target interaction prediction through label propagation with linear neighborhood information. Molecules. 2017;22(12):2056.

    Article  PubMed Central  Google Scholar 

  17. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8):30–7.

    Article  Google Scholar 

  18. Li K, Zhou X, Lin F, Zeng W, Wang B, Alterovitz G. Sparse online collaborative filtering with dynamic regularization. Inf Sci. 2019;505:535–48.

    Article  Google Scholar 

  19. Cobanoglu MC, Liu C, Hu F, Oltvai ZN, Bahar I. Predicting drug-target interactions using probabilistic matrix factorization. J Chem Inf Model. 2013;53(12):3399–409.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Mnih A, Salakhutdinov RR. Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems, 2008;1257–1264.

  21. Gönen M. Predicting drug-target interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics. 2012;28(18):2304–10.

    Article  PubMed  Google Scholar 

  22. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018;2018:1–13.

    Google Scholar 

  23. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Comput Vis Pattern Recognit, 2014;1556.

  24. Deselaers T, Hasan S, Bender O, Ney H. A deep learning approach to machine transliteration. In: Proceedings of the 4th workshop on statistical machine translation, 2009;233–241.

  25. Otter DW, Medina JR, Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst, 2020;1–21. https://0-doi-org.brum.beds.ac.uk/10.1109/TNNLS.2020.2979670

  26. Chen M, Li Y, Zhou X. Conet: Co-occurrence neural networks for recommendation. Futur Gener Comput Syst. 2021;124:308–14.

    Article  Google Scholar 

  27. Chen M, Zhou X. Deeprank: Learning to rank with neural networks for recommendation. Knowl-Based Syst. 2020;209:106478.

    Article  Google Scholar 

  28. Li K, Zhou X, Lin F, Zeng W, Alterovitz G. Deep probabilistic matrix factorization framework for online collaborative filtering. IEEE Access. 2019;7:56117–28.

    Article  Google Scholar 

  29. Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H. Deep-learning-based drug-target interaction prediction. J Proteome Res. 2017;16(4):1401–9.

    Article  PubMed  CAS  Google Scholar 

  30. Lu S, Chen H, Zhou X, Wang B, Wang H, Hong Q. Graph-based collaborative filtering with mlp. Math Prob Eng. 2018;2018.

  31. Wang Y, Zeng J. Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics. 2013;29(13):126–34.

    Article  Google Scholar 

  32. Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on machine learning, 2007;791–798.

  33. Gao KY, Fokoue A, Luo H, Iyengar A, Dey S, Zhang P. Interpretable drug target prediction using deep neural representation. In: Proceedings of the 27th international joint conference on artificial intelligence, 2018:2018;3371–3377.

  34. Altae-Tran H, Ramsundar B, Pappu AS, Pande V. Low data drug discovery with one-shot learning. ACS Cent Sci. 2017;3(4):283–93.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Peska L, Buza K, Koller J. Drug-target interaction prediction: a Bayesian ranking approach. Comput Methods Programs Biomed. 2017;152:15–21.

    Article  PubMed  Google Scholar 

  36. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L. Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th conference on uncertainty in artificial intelligence, 2012;452–461.

  37. Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. Druge-rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016;32(12):18–27.

    Article  Google Scholar 

  38. Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, 2013;1025–1033.

  39. Wang L, You Z-H, Chen X, Xia S-X, Liu F, Yan X, Zhou Y, Song K-J. A computational-based method for predicting drug-target interactions by using stacked autoencoder deep neural network. J Comput Biol. 2018;25(3):361–73.

    Article  PubMed  CAS  Google Scholar 

  40. Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012;8(5):1002503.

    Article  Google Scholar 

  41. Zhang W, Yue X, Lin W, Wu W, Liu R, Huang F, Liu F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics. 2018;19(1):1–12.

    Article  Google Scholar 

  42. Van Laarhoven T, Marchiori E. Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile. PLoS ONE. 2013;8(6):66952.

    Article  Google Scholar 

  43. He X, Liao L, Zhang H, Nie L, Hu X, Chua T-S. Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, 2017;173–182 .

  44. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic acids research 46(D1), 2018;1074–1082.

  45. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in kegg. Nucleic acids research 34(suppl\_1), 2006;354–357

  46. Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D. Brenda, the enzyme database: updates and major new developments. Nucl Acids Res 32(suppl\_1), 2004;431–433

  47. Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, et al. Supertarget and matador: resources for exploring drug-target relationships. Nucl Acids Res 36(suppl\_1), 2007;919–922 .

  48. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucl Acids Res 36(suppl\_1), 2008;901–906.

  49. Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, Costanzo LD, Duarte JM, Dutta S, Feng Z et al. The rcsb protein data bank: integrative view of protein, gene and 3d structural information. Nucl Acids Res, 2016;1000.

  50. Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):232–40.

    Article  Google Scholar 

  51. Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003;125(39):11853–65.

    Article  PubMed  CAS  Google Scholar 

  52. Smith TF, Waterman MS, et al. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–7.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank Michael McAllister for proofreading this paper.

Funding

No funding was obtained for this study.

Author information

Authors and Affiliations

Authors

Contributions

FL and WZ initialized the research project and designed the experiments; XW performed the experiments and wrote the paper; XZ designed software and performed the experiments. All authors reviewed the manuscript. All authors read approved the final manuscript.

Corresponding authors

Correspondence to Wenhua Zeng or Fan Lin.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Zeng, W., Lin, F. et al. NeuRank: learning to rank with neural networks for drug–target interaction prediction. BMC Bioinformatics 22, 567 (2021). https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-021-04476-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-021-04476-y

Keywords