Skip to main content

A new insight into underlying disease mechanism through semi-parametric latent differential network model

Abstract

Background

In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types.

Results

We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness.

Conclusions

The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.

Background

In genomic studies, graphical model has been an important tool to capture dependence among different genes. Particularly, Gaussian graphical model has been widely applied to infer the relationship between genes at the transcriptional level [14]. Under the Gaussian assumption, estimating the structure of the graphical model is equivalent to recover the support of precision matrix which is defined to be the inverse of the covariance matrix. However, in some cases, compared to focusing on a particular network, it is of greater interest to investigate how the network of connected gene pairs change from one experimental condition to another, which provides deeper insights on an underlying biological process such as identification of pathways that correspond to such a change. For instance, medical experiment usually involves two groups: the patient group and the control group. The analysis of group difference in biological networks or pathways may offer us a new insight into the underlying disease mechanism, which have extensive biomedical and clinical applications, such as identifying effective targets for drug development in a cost-effective and timely manner. Indeed, differential networking modelling has recently emerged as an important tool to analyze a set of changes in graph structure between two conditions (see, for example; [517]). In the context of genomic analysis, it is reasonable to assume that two genes are defined to be connected in the differential network if the magnitude of their conditional dependency relationship changes between two conditions. The precision matrix which is defined as the inverse of covariance matrix can capture the conditional dependency relationship. Thus the differential network is typically modelled as the difference of two precision matrices and this type of modelling has been widely used [79, 14, 15]. Figure 1a, b, c illustrate the definition of differential network. Each node represents a gene. For two groups depicted in (a) and (b), there is an edge between genes (i,j) if and only if (i,j)-th element of Ω is nonzero. For each edge, there exists a weight which is the magnitude of (i,j)-th element of Ω. Gene pair (i,j) is defined to be connected in the differential network in (c) if the magnitudes of (i,j)-th elements of two precision matrices change between two groups.

Fig. 1
figure 1

Illustration of latent differential network. a Network in group 1. b Network in group 2. c Differential network. d Data sources. e Data type. f Latent distribution

One straightforward approach to estimate the difference of two precision matrices is to separately estimate the precision matrices and then subtract the estimates. In the high dimensional setting where the dimension p is much larger than the sample size n, which is often the case for genomic study, many estimation approaches for the precision matrix have been proposed and proved to enjoy nice theoretical properties and computation advantage under the key assumption of sparsity. And this topic has been an active area of research in recent years [1822].

Another type of approach to estimate the difference of two precision matrices is to jointly estimate the precision matrices. Guo et al. [23] penalized the joint loglikelihood with a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. Danaher et al. [24] proposed the joint graphical Lasso, which is based upon maximizing a penalized log-likelihood with generalized fused Lasso or group Lasso penalty. Motivated by the constrained 1 minimization approach to precision matrix estimation of [22], Zhao et al. [7] proposed an estimation approach to directly estimate the difference of the precision matrices.

For the separately estimating methods, Liu et al. [25] proposed the nonparanormal family to relax the Gaussian assumption. While the nonparanormal family is much larger than the standard parametric Gaussian family, the independence relations among the variables are still encoded in the precision matrix. In addition, Liu et al. [26] proposed a semiparametric approach called nonparanormal SKEPTIC to estimate high dimensional undirected graphical models efficiently and robustly and proved that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergency in terms of precision matrix estimation and graph recovery. Xue and Zou [27] proposed a similar regularized rank-based estimation idea for estimating nonparanormal graphical models and analyzed adaptive versions of rank-based Dantzig selector and CLIME estimators. He et al. [28] proposed a multiple testing procedure to estimate high-dimensional nonparanormal graphical model and proved that the proposed procedure can control the false discovery rate (FDR) asymptotically.

The disadvantage of Gaussian or nonparanormal graphical models lies in that they are only tailored for modeling continuous data. However, in genomic studies, we may encounter discrete data (e.g. CNV data and SNP data), continuous data (e.g. gene expression and methylation data) or data of hybrid types with both discrete and continuous variables. Besides, in some circumstances, even if the data are continuous, we still need to transform the data into discrete data to remove the heterogeneity (e.g. batch effect, outliers and population stratification). For instance, in the analysis of gene expression data collected from different platforms, to remove the unwanted variation among different experiments known as the batch effects, numerical expression data are often transformed into 0/1 binary data, where lower expression values are encoded as 0 and higher expression values are encoded as 1. In this setting, it is reasonable to assume that the discrete variable is obtained by discretizing a latent variable. Fan et al. [29] proposed a general model named the latent Gaussian copula graphical model, assuming that the observed discrete data are generated by discretizing a latent continuous variable at some unknown cutoff.

In this paper, we consider estimating differential network for various types of biological data in a joint way. We propose a unified semi-parametric latent variable differential network model. The latent differential network model is illustrated in Fig. 1e-f. For biological data, there exist continuous data, discrete data or data of hybrid types with both continuous and discrete data. It is assumed that these data are collected by transforming latent continuous variables which are unobservable. We are interested in the differential network of the latent variables, which provide deeper understanding of the unknown mechanism than that among the observed variables. To the best of our knowledge, our work provides the first method for differential network estimation for binary or mixed data with theoretical guarantees under the high dimensional scaling. The advantages of the proposed methods lie in the following aspects: (I) Our method provides a way to infer the differential network structure among latent variables, which provides deeper understanding of the unknown mechanism than that among the observed variables. (II) Theoretical analysis shows that the proposed methods achieve the same parametric rates of convergence for both difference matrix estimation and differential graph recovery, as if the latent variables were observed. (III) The proposed methods are much more robust to outliers due to the rank-based correlation matrix estimator. (IV) The proposed approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Simulation result shows that the proposed method performs much better and more robustly than several state-of-the-art methods. The proposed methods are applied on a gene expression data set associated with lung cancer. A target gene WIF1 stands out by the proposed method, which indeed is verified as a frequent target for epigenetic silencing in various human cancers [30]. The real data example illustrates the great usefulness of the current work.

Methods

In this part, we propose novel definitions of latent differential network model for various types of data. In essence, we define the differential network as the difference of two precision matrices of the latent variables, which greatly generalizes the applicability in areas such as bioinformatics, medical research and so on.

Gaussian copula differential graphical model

We first review the definition of the Gaussian copula distribution. Let f={f1,…,fp} be a set of strictly increasing univariate functions. A p dimensional random variable X=(X1,…,Xp) is said to follow the Gaussian copula distribution if and only if f(X):=(f1(X1),…,fp(Xp)) :=ZNp(μ,Σ) and is noted as XNPN(μ,Σ,f), where μ=(μ1,…,μp),Σ=[Σjk] are respectively the mean vector and the correlation matrix of the Gaussian variate Z. The conditional independence structure of X is encoded by the sparsity pattern of Ω=Σ−1. Specifically, it can be shown that Xi is conditionally independent of Xj given all other variables if and only if ωij=0, where ωij is the (i,j)-th element of Ω. Therefore, the differential network of the Gaussian copula variables can be defined to be the difference between the two precision matrices, just the same as for the parametric Gaussian case.

Assume Xi=(Xi1,…,Xip) for i=1,…,nX are independent observations of the expression levels of p genes from one group denoted by X and Yi=(Yi1,…,Yip) for i=1,…,nY from the other denoted by Y, XNPN(μX,ΣX,fX) and YNPN(μY,ΣY,fY). The differential network is defined to be the difference between the two precision matrices, denoted by Δ0=ΩYΩX, where ΩY and ΩX are the inverse matrices of ΣY and ΣX separately.

We propose a rank-based estimator of ΣX. It is known that if ZNPN(μ,Σ,f), then we have \(\Sigma _{jk}=\sin \left (\frac {\pi }{2}\tau _{jk}\right)\), where τjk is Kendall’s tau correlation between Zj and Zk. Thus we can estimate the unknown correlation matrix ΣX by:

$$ \hat{S}^{X}_{jk}=\left\{ \begin{array}{ccc} \sin\left(\frac{\pi}{2}\hat{\tau}^{X}_{jk}\right) & & {j\neq k}\\ 1 & & {j=k} \end{array} \right., $$
(1)

where \(\hat {\tau }^{X}_{jk}\) is the sample Kendall’s tau correlation between Xj and Xk. Similarly, we can estimate ΣY in the same way and obtain the estimator \(\hat {\boldsymbol {S}}^{Y}\). Motivated by the direct estimation method of the difference of two precision matrices proposed by [7], one can obtain the estimator of Δ0 by solving

$$\text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \ \ |\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}|_{\infty}\leq \lambda_{n}, $$

which is equivalent to the optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \ \text{subject to} \\ \\ \left|\left(\hat{\boldsymbol{S}}^{X}\otimes \hat{\boldsymbol{S}}^{Y}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec} \left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right|_{\infty}\leq \lambda_{n}, \end{array} $$
(2)

where denotes the Kronecker product, \(|\boldsymbol {\Delta }|_{1}=\sum _{jk}\delta _{jk}\) is the element-wise 1 norm of the matrix Δ. Here, for a matrix A=[Ajk], |A|= maxjk|Ajk| and for a vector a=(aj), |a|= maxj|aj|.

As seen from Eq. (2), the proposed approach can directly estimate the difference matrix without implicitly estimating the individual precision matrices. Thus there is no need to assume the sparsity of (ΣY)−1 and (ΣX)−1. We only need to assume that Δ0 is sparse. Besides, compared to the sample covariance matrix, the rank-based estimators here can enjoy modelling flexibility and estimation robustness, especially when outliers exist.

Latent Gaussian copula differential graphical model for binary data

In the analysis of gene expression data, to remove the batch effects, numerical expression data are often transformed into 0/1 binary data, where lower expression values are encoded as 0 and higher expression values are encoded as 1. To estimate the underlying differential network for the binary data from two different groups, we assume that the observed discrete data are generated by discretizing a latent continuous variable at some unknown cutoff. To make the model more flexible, we assume the latent continuous variable is Gaussian copula distributed instead of Gaussian. Let B=(B1,B2,…,Bp){0,1}p be a p-dimensional 0/1-random vector. The 0/1-random vector B satisfies the latent Gaussian copula model (LGCM) for binary data, if there exists a p dimensional random vector XNPN(0,Σ,f) such that

$$B_{j}=I\left(X_{j}>C_{j}\right), \ \ \text{for} \ \ \text{all} \ \ j=1,\ldots,p, $$

where I(·) is the indicator function and the cutoff C=(C1,…,Cp) is a vector of constants. Then we denote B LGCM(Σ,f,C). We call Σ the latent correlation matrix. The latent Gaussian copula model involves parameters (Σ,f,C). Merely based on the binary random vector B, only fj(Cj),j=1,…,p are identifiable. Denote Λ=(Λ1,…,Λp), where Λj=fj(Cj). For notational simplicity, we write LGCM(Σ,Λ) for LGCM(Σ,f,C).

Assume \({\boldsymbol {B}}^{1}_{i}=\left (B^{1}_{i1},\ldots,B^{1}_{ip}\right)^{\top }\) for i=1,…,n1 are independent observations of the binary expression levels of p genes from one group denoted by B1 and \(\boldsymbol {B}^{2}_{i}=\left (B^{2}_{i1},\ldots,B^{2}_{ip}\right)^{\top }\) for i=1,…,n2 from the other denoted by B2, where B1LGCM(Σ1,Λ1) and B2LGCM(Σ2,Λ2). The differential network is defined to be the difference between the two precision matrices, denoted by \(\boldsymbol {\Delta }_{0}^{\mathrm {B}}=\left ({\boldsymbol {\Sigma }^{2}}\right)^{-1}-\left ({\boldsymbol {\Sigma }^{1}}\right)^{-1}\). Motivated by Eq. (2), we should first derive estimators for Σ1 and Σ2. For ease of presentation, we only present the procedure to construct the estimator for Σ1, estimator for Σ2 can be obtained similarly. Denote the Kendall’s tau correlation between \(B_{j}^{1}\) and \(B_{k}^{1}\) by \(\tau _{jk}^{1}\), it can be shown that \(\tau _{jk}^{1}\) satisfies:

$$\tau_{jk}^{1}=2\left\{\Phi_{2}\left(\Lambda_{j}^{1},\Lambda_{k}^{1},\Sigma^{1}_{jk}\right)-\Phi\left(\Lambda_{j}^{1}\right)\Phi\left(\Lambda_{k}^{1}\right)\right\}, $$

where

$$\Phi_{2}(u,v,t)=\int_{x_{1}< u}\int_{x_{2}< v}\phi_{2}(x_{1},x_{2};t){dx}_{1}{dx}_{2}, $$

is the cumulative distribution function of the standard bivariate normal distribution, ϕ2(x1,x2;t) is the probability density function of the standard bivariate normal distribution with correlation t. Denote by

$$ F\left(t;\Lambda_{j}^{1},\Lambda_{k}^{1}\right)=2\left\{\Phi_{2}\left(\Lambda_{j}^{1},\Lambda_{k}^{1},t\right)-\Phi \left(\Lambda_{j}^{1}\right)\Phi\left(\Lambda_{k}^{1}\right)\right\}. $$

For any fixed \(\Lambda _{j}^{1}\) and \(\Lambda _{k}^{1}\), it can be shown that \(F\left (t;\Lambda _{j}^{1},\Lambda _{k}^{1}\right)\) is a strictly monotonic increasing function on t(−1,1) and thus is invertible. Given \(\Lambda _{j}^{1}\) and \(\Lambda _{k}^{1}\), one can estimate \(\Sigma _{jk}^{1}\) by \(F^{-1}\left (\hat {\tau }_{jk}^{1};\Lambda _{j}^{1},\Lambda _{k}^{1}\right)\). However, the cutoff values are unknown in practice. As \(E\left (B^{1}_{ij}\right)=1-\Phi \left (\Lambda _{j}^{1}\right)\), we can estimate \(\Lambda _{j}^{1}\) by \(\hat {\Lambda }_{j}^{1}=\Phi ^{-1}\left (1-\bar {B}^{1}_{j}\right)\), where \(\bar {B}_{j}^{1}=\sum _{i=1}^{n}B^{1}_{ij}/n\). Thus the Kendall’s tau rank-based correlation matrix estimator \(\hat {\boldsymbol {b}}^{1}=\left [\hat {\mathrm {R}}^{1}_{jk}\right ]\) for Σ1 is a p×p matrix with element entry given by

$$ \hat{\mathrm{R}}_{jk}^{1}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}_{jk}^{1};\hat{\Lambda}_{j}^{1},\hat{\Lambda}_{k}^{1}\right) & & {j\neq k},\\ 1, & & {j=k}. \end{array} \right. $$
(3)

Similarly, the Kendall’s tau rank-based correlation matrix estimator \(\hat {\boldsymbol {b}}^{2}=\left [\hat {\mathrm {R}}^{2}_{jk}\right ]\) for Σ2 is a p×p matrix with element entry given by

$$ \hat{\mathrm{R}}_{jk}^{2}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}_{jk}^{2};\hat{\Lambda}_{j}^{2},\hat{\Lambda}_{k}^{2}\right) & & {j\neq k},\\ 1, & & {j=k}. \end{array} \right. $$
(4)

Motivated by Eq. (2), we can obtain an estimator of \(\boldsymbol {\Delta }_{0}^{\mathrm {B}}\) by solving the following optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \\ \left|\left(\hat{\boldsymbol{b}}^{1}\otimes \hat{\boldsymbol{b}}^{2}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\boldsymbol{b}}^{1}-\hat{\boldsymbol{b}}^{2}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$
(5)

For the binary data, we aim to infer the differential network among latent variables, which provides deeper understanding of the unknown mechanism than that among the observed binary variables. Thus, our model complements the existing work on high dimensional differential network estimation, which mostly focused on learning differential network among observed variables including, for example, the Ising model.

Latent Gaussian copula differential graphical model for mixed data

In the analysis of biological data, there also exists the case where some biological data are discrete while some others are continuous. For instance, multi-level omics data integrative analysis involves gene mutation, expression, methylation, metabolome and phenome data. In this case, mixed data appear naturally. We start with the definition of the latent Gaussian copula model for mixed data. Assume that M=(M1,M2), where M1 represents the p1-dimensional binary variables and M2 represents the p2-dimensional continuous variables. The random vector M satisfies the latent Gaussian copula model for mixed data, if there exists a p1 dimensional random vector X1 such that X=(X1,M2) NPN(0,Σ,f) and

$$M_{j}=I\left(X_{j}>C_{j}\right) \ \ \text{for} \ \ \text{all} \ \ j=1,\ldots,p_{1}, $$

where \(\boldsymbol {C}=\left (C_{1},\ldots,C_{p_{1}}\right)\) is a vector of constants. Then we denote M LGCM(0,Σ,f,C), and call Σ the latent correlation matrix. In the latent Gaussian copula regression model, the binary components M1 are generated by a latent continuous random vector X1 truncated at C, and combining with the continuous components M2, X=(X1,M2) satisfies the Gaussian copula model. For the binary data M1, only Λj=fj(Cj),j=1,…,p1 are identifiable. For the continuous components M2, the marginal transformations fj(·),j=p1+1,…,p are identifiable.

Assume \({{\boldsymbol {M}}^{1}_{i}=\left (M^{1}_{i1},\ldots,M^{1}_{ip}\right)^{\top }}\) for i=1,…,n1 are independent observations of the expression levels of p genes from one group denoted by M1 and \({\boldsymbol {M}^{2}_{i}=\left (M^{2}_{i1},\ldots,M^{2}_{ip}\right)^{\top }}\) for i=1,…,n2 from the other denoted by M2, where M1LGCM(Σ1,Λ1) and M2LGCM(Σ2,Λ2). The differential network is defined to be the difference between the two precision matrices, denoted by \(\boldsymbol {\Delta }_{0}^{\mathrm {M}}=\left ({\boldsymbol {\Sigma }^{2}}\right)^{-1}-\left ({\boldsymbol {\Sigma }^{1}}\right)^{-1}\). Similar to the discussions in the last sections, we first need to construct estimators for Σ1 and Σ2. For ease of presentation, we only present the procedure to construct the estimator for Σ1, estimator for Σ2 can be obtained similarly. For discrete components \(M^{1}_{ij},M^{1}_{ik}(1\leq j,k\leq p_{1})\), as what we have discussed in the last subsection with a slight change of notation, we can estimate \(\Sigma ^{1}_{jk}\) by:

$$ \hat{\mathrm{T}}^{1}_{jk}=\left\{ \begin{array}{ccc} F^{-1}\left(\hat{\tau}^{1}_{jk};\hat{\Lambda}_{j}^{1},\hat{\Lambda}_{k}^{1}\right) & & 1\leq{j\neq k}\leq p_{1},\\ 1, & & 1\leq {j=k}\leq p_{1}. \end{array} \right. $$
(6)

For continuous components \(M_{ij}^{1},M_{ik}^{1}\), as what we have discussed, we can estimate \(\Sigma ^{1}_{jk}\) by:

$$ \hat{\mathrm{T}}^{1}_{jk}=\left\{ \begin{array}{ccc} \text{sin}\left(\frac{\pi}{2}\hat{\tau}_{jk}\right) & & p_{1}+1\leq {j\neq k}\leq p,\\ 1, & & p_{1}+1\leq {j=k}\leq p. \end{array} \right. $$
(7)

where \(\hat {\tau }_{jk}\) is defined as follows:

$$ \hat{\tau}_{jk}^{1}\,=\,\frac{2}{n_{1}(n_{1}\,-\,1)}\sum_{1\leq i\leq i^{\prime}\leq n_{1}}\text{sign}\!\left(\!M^{1}_{ij}\,-\,M^{1}_{i^{\prime} j}\!\right)\cdot\text{sign}\!\left(M^{1}_{ik}\,-\,M^{1}_{i^{\prime} k}\right)\!. $$

We still need to consider the mixed case. Without loss of generality, we assume that \(M^{1}_{ij}\) is binary and \(M^{1}_{ik}\) is continuous. In this case, the Kendall’s tau correlation can be expressed by

$$ \hat{\tau}_{jk}^{1}=\frac{2}{n_{1}(n_{1}-1)}\sum_{1\leq i\leq i^{\prime}\leq n_{1}}\left(M^{1}_{ij}-M^{1}_{i^{\prime} j}\right)\cdot\text{sign}\left(M^{1}_{ik}\,-\,M^{1}_{i^{\prime} k}\right). $$

The population version of Kendall’s tau correlation \(\tau _{jk}^{1}=E\left (\hat {\tau }^{1}_{jk}\right)\) can be expressed by \(\tau ^{1}_{jk}=H\left (\Sigma ^{1}_{jk};\Lambda ^{1}_{j}\right)\), where

$$H\left(t;\Lambda_{j}^{1}\right)=4\Phi_{2}\left(\Lambda_{j}^{1},0,t/\sqrt{2}\right)-2\Phi\left(\Lambda_{j}^{1}\right). $$

Moreover, for fixed \(\Lambda _{j}^{1}\), \(H\left (t;\Lambda _{j}^{1}\right)\) is an invertible function of t. The parameter \(\Lambda _{j}^{1}\) could be estimated by \(\Lambda _{j}^{1}=\Phi ^{-1}\left (1-\bar {M}^{1}_{j}\right)\), where \(\bar {M}^{1}_{j}=\sum _{i=1}^{n}M^{1}_{ij}/n\). Thus when \(M^{1}_{ij}\) is binary and \(M^{1}_{ik}\) is continuous, \(\Sigma ^{1}_{jk}\) can be estimated by the Kendall’ tau rank-based estimator:

$$ \hat{\mathrm{T}}_{jk}^{1}= H^{-1}\left(\hat{\tau}^{1}_{jk};\hat{\Lambda}^{1}_{j}\right), \ \ 1\leq j\leq p_{1} < k\leq p, $$
(8)

where \(H^{-1}\left (\tau,\Lambda ^{1}_{j}\right)\) is the inverse function of \(H\left (t,\Lambda ^{1}_{j}\right)\) for fixed \(\Lambda ^{1}_{j}\). Thus the Kendall’s tau rank-based correlation matrix estimator \(\hat {\mathbf {T}}^{1}=\left [\hat {\mathrm {T}}^{1}_{jk}\right ]\) for Σ1 is a p×p matrix with corresponding element entry given by Eqs. (6), (7), and (8) respectively. Similarly, we can obtain estimator \(\hat {\mathbf {T}}^{2}\) for Σ2. Motivated by Eq. (2), we can obtain an estimator of Δ0 by solving the following optimization problem:

$$ \begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to}\\ \\ \left|\left(\hat{\mathbf{T}}^{1}\otimes \hat{\mathbf{T}}^{2}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\mathbf{T}}^{1}-\hat{\mathbf{T}}^{2}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$
(9)

We show that the rank-based covariance matrix estimators achieve the same parametric rate of convergence for both difference matrix estimation and differential graph recovery in the Additional file 1. Thus the extra modelling flexibility comes at almost no cost of statistical efficiency. Besides, for the binary data or data of hybrid types with both binary and continuous variables, the differential network among latent variables can be well estimated, which provides deeper understanding of the unknown mechanism than that among the observed variables.

Implementation

In this section we will present how to solve the optimization problems in Eqs. (2), (5), and (9). For ease of presentation, we only present the procedure to obtain the solution to optimization problem in Eq. (2) and solutions to optimization problems in Eqs. (5) and (9) can be obtained in the similar way.

Recall that in Eq. (2), the optimization problem is

$$\begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \text{subject to} \\ \\ \left|\left(\hat{\boldsymbol{S}}^{X}\otimes \hat{\boldsymbol{S}}^{Y}\right)\text{Vec}(\boldsymbol{\Delta})-\text{Vec}\left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right|_{\infty}\leq \lambda_{n}. \end{array} $$

Let Δ= [δjk]1≤j,kp and define θ to be the p(p+1)/2×1 vector with θ=(δjk)1≤jkp. Estimating a symmetric Δ is thus equivalent to estimating θ, which alleviates the computation burden especially when p is large. Define the p2×p(p+1)/2 matrix Γ with columns indexed by 1≤jkp and with rows indexed by l=1,…,p and m=1,…,p, so that each entry is labeled by Γlm,jk. For jk, let Γjk,jk=Γkj,jk=1 and set all other entries of Γ equal to zero. With these notations, one may consider the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$
(10)

where \(\hat {\boldsymbol {S}}=\hat {\boldsymbol {S}}^{X}\otimes \hat {\boldsymbol {S}}^{Y}\), \(\hat {\boldsymbol {s}}=\text {Vec}\left (\hat {\boldsymbol {S}}^{X}-\hat {\boldsymbol {S}}^{Y}\right)\) and for a p(p+1)/2×1 vector c, |c|O denotes the sup-norm of the entries of c corresponding to the off diagonal elements of its matrix form, while |c|D denotes the sup-norm of the entries of c corresponding to the diagonal elements. The matrix form of \(\hat {\boldsymbol {\theta }}\) will be denoted by \(\hat {\boldsymbol {\Delta }}\) in the following sections. The optimization problem in Eq. (10) can be solved by the alternating direction method of multipliers (ADMM), for a thorough discussion, we refer to [31]. For the optimization problem in Eq. (10), to apply the ADMM algorithm, we rewrite it as:

$$\begin{array}{ll} \hat{\boldsymbol{\theta}}=\text{arg\ min}_{\boldsymbol{\theta}, \boldsymbol{z}} \left\{|\boldsymbol{\theta}|_{1}+g(\boldsymbol{z})\right\} \\ \text{subject to} ~~\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}=\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}, \end{array} $$

where the function g(·) is defined by

$$g(\boldsymbol{z})=\left\{ \begin{array}{ccc} \infty & &|\boldsymbol{z}_{O\infty}|>\lambda_{n}\ \ \text{or} \ \ |\boldsymbol{z}_{D\infty}|>\lambda_{n}/2.\\ 0, & & \text{otherwise}. \end{array} \right.$$

The augmented Lagrangian can be written as

$$ \begin{array}{ll} L_{\rho}(\boldsymbol{\theta},\boldsymbol{z},\boldsymbol{u})&=\boldsymbol{u}^{\top}\left(\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right)+|\boldsymbol{\theta}|_{1}\\ &\quad+\frac{\rho}{2} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}+\boldsymbol{z}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}\right|_{2}^{2}+g(\boldsymbol{z}), \end{array} $$
(11)

where u is the Lagrange multiplier and ρ is a positive penalty parameter which can be specified by users. The ADMM algorithm is based on minimizing the augmented Lagrangian in (11) over θ and z and then applying a dual variable update to the Lagrange multiplier u, which yields the updates

$$\begin{array}{ccl} \boldsymbol{z}^{(t+1)}& = &\underset{\boldsymbol{z}}{\text{arg\ min}}\left|\boldsymbol{u}^{(t)}/\rho + \boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}} \boldsymbol{\Gamma}\boldsymbol{\theta}^{(t)}-\boldsymbol{z}\right|_{2}^{2}\\ \\ &&+2{g(\boldsymbol{z})}/{\rho}\\ \boldsymbol{\theta}^{(t+1)}& = &\underset{\boldsymbol{\theta}}{\text{arg\ min}}\Big|\frac{\boldsymbol{u}^{(t)}}{\rho} + \boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{S}} \boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{z}^{(t+1)}\Big|_{2}^{2}\\ \\ &&+2{|\boldsymbol{\theta}|_{1}}/{\rho}\\ \boldsymbol{u}^{(t+1)}& = &\boldsymbol{u}^{(t)}+\rho\left(\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{s}}-\boldsymbol{\Gamma}^{\top} \hat{\boldsymbol{S}}\boldsymbol{\Gamma}\boldsymbol{\theta}^{(t+1)}-\boldsymbol{z}^{(t+1)}\right)\\ \end{array} $$

for iterations t=0,1,2…. As for the tuning parameter λn in (10), it can be chosen by an approximate Akaike information criterion (AIC). λn is chosen to minimize

$$(n_{X}+n_{Y})L(\lambda_{n})+2k, $$

where k is the effective degrees of freedom that can be approximated by \(|\hat {\boldsymbol {\theta }}|_{0}\) and L(λn) represents the loss function either L or LF which are defined by

$$L_{\infty}(\lambda_{n})=\left|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}} (\lambda_{n})\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right|_{\infty}, $$
$$\,\, L_{\mathrm{F}}(\lambda_{n})=\left\|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}}(\lambda_{n}) \hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right\|_{\mathrm{F}}. $$

In this paper we focus on the loss functions with the supremum and Frobenius norms for further theoretical development. One may also use other matrix norms, such as spectral norm:

$$L_{\text{sp}}(\lambda_{n})=\left\|\hat{\boldsymbol{S}}^{X}\hat{\boldsymbol{\Delta}}(\lambda_{n})\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right\|_{2}. $$

Similarly, for the latent Gaussian copula model for binary data, one can solve the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}^{\mathrm{B}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{b}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{r}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{b}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{r}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$
(12)

where \(\hat {\boldsymbol {b}}=\hat {\boldsymbol {b}}^{1}\otimes \hat {\boldsymbol {b}}^{2}\), \(\hat {\boldsymbol {r}}=\text {Vec}\left (\hat {\boldsymbol {b}}^{1}-\hat {\boldsymbol {b}}^{2}\right)\). The matrix form of \(\hat {\boldsymbol {\theta }}^{\mathrm {B}}\) will be denoted by \(\hat {\boldsymbol {\Delta }}^{\mathrm {B}}\) in the following sections. For the latent Gaussian copula model for mixed data, one can solve the following optimization problem:

$$ \begin{array}{ll} \hat{\boldsymbol{\theta}}^{\mathrm{M}}=\text{arg\ min}|\boldsymbol{\theta}|_{1} \ \ \text{subject to} \\ \left\{ \begin{array}{l} \left|\boldsymbol{\Gamma}^{\top}\hat{\mathbf{T}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{t}}\right|_{O\infty}\leq \lambda_{n},\\ \left|\boldsymbol{\Gamma}^{\top}\hat{\mathbf{T}}\boldsymbol{\Gamma}\boldsymbol{\theta}-\boldsymbol{\Gamma}^{\top}\hat{\boldsymbol{t}}\right|_{D\infty}\leq \lambda_{n}/2, \end{array} \right. \end{array} $$
(13)

where \(\hat {\mathbf {T}}=\hat {\mathbf {T}}^{1}\otimes \hat {\mathbf {T}}^{2}\), \(\hat {\boldsymbol {t}}=\text {Vec}\left (\hat {\mathbf {T}}^{1}-\hat {\mathbf {T}}^{2}\right)\). The matrix form of \(\hat {\boldsymbol {\theta }}^{\mathrm {M}}\) will be denoted by \(\hat {\boldsymbol {\Delta }}^{\mathrm {M}}\) in the following sections. Besides, corresponding Akaike information criterion can be proposed to choose the tuning parameter λn.

Simulation

Simulation for Gaussian copula differential graphical model In this part, we conduct simulation study for differential network estimation under Gaussian copula model. We mainly focus on the graphs that contain hub nodes. First we generate the edge set EX for the group X. We partition p features into 5 equally-sized and non-overlapping sets: C1C2C5={1,…,p}, |Ck|=p/5, CiCj=. For the smallest iCk, we set (i,j)Ck for all {ji:jCk}. The non-zero entries of ΩX is then determined by the edge set EX, where ΩX=(ΣX)−1. Next, the value of each nonzero entry of ΩX was generated from a uniform distribution with support [−0.75,−0.25][0.25,0.75]. To ensure positive definiteness of ΩX, let ΩX=ΩX+(0.2+|λmin(ΩX)|)I. At last the ΩX is rescaled such that ΣX is a correlation matrix. Then we proceed to generate the differential network Δ0. We randomly select two hub nodes from the 5 equally-sized and non-overlapping sets. The differential network Δ0 is generated such that the connections of these two hub nodes change sign between ΩX and ΩY. The correlation matrix ΣX and ΣY are generated by (ΩX)−1 and (ΩY)−1 respectively. Finally we generate nX i.i.d observations of ZX from the N(0,ΣX) distribution and nY i.i.d observations of ZY from the N(0,ΣY) distribution. Next we sample nX i.i.d samples from the nonparanormal distribution NPN(0,ΣX,fX) and nY i.i.d samples from the nonparanormal distribution NPN(0,ΣY,fY). For simplicity, we use the same univariate transformations on each dimension: \(f_{1}^{X}=f_{2}^{X}=\cdots =f_{p}^{X}=f\) and fX=fY. To sample data from the nonparanormal distribution, we also need g:=f−1. We consider the Gaussian CDF Transformation of g which is used in [26].

In the simulation study, we let p= 50,80,100,120 and nX=nY=100. The simulation result is based on 100 replications. For each simulated data set, we apply three estimation methods. That is, the direct differential network estimator (DDN) in [7], the rank-based differential network estimator (RDN) and the direct differential network estimator based on the latent variable Z and Pearson correlation (ZP-DDN). In ZP-DDN, we assume that ZX and ZY are observed and the Pearson correlation estimator of cov(ZX) and cov(ZY) are plugged into the direct estimation procedure. While ZP-DDN are often not available in real applications, we use ZP-DDN as benchmarks for quantifying the information loss of the remaining estimators.

We evaluate the performance of the estimation methods from two aspects: support recovery and estimation error. The support recovery results are evaluated by true positive rate (TPR) and true negative rate (TNR) along a range of tuning parameter λ. Suppose the true difference matrix Δ0 has the support \(\mathcal {S}_{0}=\{(j,k):\delta _{jk}^{0}\neq 0, \ \text {and} \ j\neq k\}\) and its estimator \(\hat {\boldsymbol {\Delta }}\) has the support set \(\hat {\mathcal {S}}\). TPR and TNR are defined as follows:

$$\text{{tpr}}=\frac{\text{{tp}}}{{|\mathcal{S}_{0}|}}, \ \ \ \text{{tnr}}=\frac{\text{{tn}}}{p(p-1)-|\mathcal{S}_{0}|}, $$

where TP and TN are the numbers of true positives and true negatives respectively, which are defined as

$$\begin{array}{*{20}l} \text{{tp}}&=\#\left\{(i,j):(i,j)\in \mathcal{S}_{0}\cap\hat{\mathcal{S}}\right\}, \\ \text{{tn}}&=\#\left\{(i,j):(i,j)\in \mathcal{S}_{0}^{c}\cap\hat{\mathcal{S}}^{c}\right\}. \end{array} $$

To evaluate the support recovery performance, we use the true discovery rate, which is defined as TD =TP\(/{{|\hat {\mathcal {S}}_{0}|}}\). As for the estimation error, we calculate the element-wise L norm and Frobenius norm of \(\hat {\boldsymbol {\Delta }}-\boldsymbol {\Delta }_{0}\).

Simulation for latent Gaussian copula differential graphical model In this part, we conduct simulation study for differential network estimation under Latent Gaussian copula model. We assume that the cutoff vector CUnif [0,1] and let Σ1 and Σ2 be generated in the same way as ΣX and ΣY described in the last subsection. We consider the following three Scenarios:

Scenario 1 Generate data \(\boldsymbol {B}^{1}=\left (B_{1}^{1},\ldots,B_{p}^{1}\right)^{\top }\), where \(B^{1}_{j}=I(X_{j}>C_{j}),j=1,\ldots,p\) and XNPN(0,Σ1,f1); Generate data \(\boldsymbol {B}^{2}=\left (B_{1}^{2},\ldots,B_{p}^{2}\right)^{\top }\), where \(B^{2}_{j}=I(Y_{j}>C_{j}),j=1,\ldots,p\) and YNPN(0,Σ2,f2). The transformation functions f1 and f2 are Gaussian CDF transformation.

Scenario 2 Generate data \(\boldsymbol {M}^{1}=\left (M_{1}^{1},\ldots,M_{p}^{1}\right)^{\top }\), where \(M^{1}_{j}=I\left (X_{j}>C_{j}\right),j=(p/2+1),\ldots,p\), XNPN(0,Σ1,f1) and \(M^{1}_{j}=X_{j},j=1,\ldots,p/2\);

Generate data \(\boldsymbol {M}^{2}=\left (M_{1}^{2},\ldots,M_{p}^{2}\right)^{\top }\), where \(M^{2}_{j}=I\left (Y_{j}>C_{j}\right),j=p/2+1,\ldots,p\), and YNPN(0,Σ2,f2) and \(M^{2}_{j}=Y_{j},j=1,\ldots,p/2\). The transformation functions f1 and f2 are Gaussian CDF transformation.

Scenario 3 Generate data \(\boldsymbol {B}^{1}=\left (B_{1}^{1},\ldots,B_{p}^{1}\right)^{\top }\), where \(B^{1}_{j}=I\left (Z^{1}_{j}>C_{j}\right),j=1,\ldots,p\) and Z1N(0,Σ1), where 10 entries in each Z1 is randomly sampled and replaced by -5 or 5;

Generate data \(\boldsymbol {B}^{2}=\left (B_{1}^{2},\ldots,B_{p}^{2}\right)^{\top }\), where \(B^{2}_{j}=I\left (Z^{2}_{j}>C_{j}\right),j=1,\ldots,p\) and Z2N(0,Σ2), where 10 entries in each Z2 is randomly sampled and replaced by -5 or 5.

In Scenario 1 and Scenario 3, we generate binary data. Scenario 1 corresponds to the latent Gaussian copula model and Scenario 3 corresponds to the setting where the binary data can be misclassified due to the outliers of the latent Gaussian variable. Scenario 3 is designed to investigate the robustness of the proposed approach. Scenario 2 corresponds to the mixed data generated from the latent Gaussian copula model.

Application to gene expression data sets related to lung cancer

In this section we consider the differential network estimation for a gene expression data set related to lung cancer. The data set is publicly available from the Gene Expression Omnibus at accession number GDS2771 and was studied in [24]. It includes 22,283 microarray-derived gene expression measurements from large airway epithelial cells sampled from 97 patients with lung cancer and 90 controls in the data set. It is of interest to investigate how the structure of the gene co-expression network differs between the group of patients with lung cancer and the control group. It may shed light on underlying lung cancer mechanisms. In this real example study, we limited our analysis to the 122 genes in the Wnt signaling pathway. The Wnt signaling pathway has recently emerged as a critical pathway in lung carcinogenesis as already demonstrated in many cancers and particularly in colorectal cancer [32]. The Gene expression levels were analyzed on a logarithmic scale. Each gene feature was standardized to have mean zero and standard deviation 1 within the cancer samples and the controls separately.

Results

Simulation results for Gaussian copula differential graphical model

The receiver operating characteristic (ROC) curves of the three estimation methods are depicted in Fig. 2. It shows that the proposed method RDN compares favourably with the benchmark method ZP-DDN, which means that the information loss is negligible. Besides, Fig. 2 also shows that DDN performs pretty bad in the non-Gaussian case.

Fig. 2
figure 2

Receiver operating characteristic curves under Gaussian copula model with dimensionality varying from 50 to 120. The red line represents the proposed RDN method, the black dotted represents the benchmark method ZP-DDN, the blue dotted line represents DDN method. a Scenario 3, p = 50. b Scenario 3, p = 80. c Scenario 3, p = 100. d Scenario 3, p = 120

Table 1 gives the true discovery rates with different loss functions. The results also show the method RDN compares favourably with the benchmark method ZP-DDN. For all the methods, tuning using the LF gives better true discovery rates than tuning using the L. Table 1 depicts the elementwise L norm estimation accuracies of the thresholded estimators tuned using the loss functions L and LF. From Table 1, we can see that the LF loss function gives slightly better results than the L loss function. For all the methods, the elementwise L norm estimation accuracy are comparable. We point out that it is possible for RDN to simultaneously give better support recovery but similar estimation than DDN. The reason is that estimation error depends on the magnitudes of the estimated entries, while support recovery depends only on whether the entries are nonzero. Besides, RDN has comparable performance with the benchmark method ZP-DDN in terms of both support recovery and estimation accuracy, which indicates that the information loss of the estimator RDN is negligible.

Table 1 Average true discovery rates (%) and average estimation errors over 100 simulations

Simulation results for Latent Gaussian copula differential graphical model

The ROC curves for Scenario 1 and Scenario 2 with different dimensionality p (varying from 50 to 120) is presented in Fig. 3. Table 2 give the true discovery rates with different loss functions and the elementwise L norm estimation accuracies of the thresholded estimators tuned using the loss functions L and LF, respectively. For method ZR-RDN, we assume that the latent Gaussian copula variables are observed. In particular, the rank-based correlation matrix estimator of the latent Gaussian copula variables are plugged into the direct estimation procedure. With a slight abuse of notation, the RDN method here refers to either the rank-based method for binary data or for mixed data. The ROC curves in Fig. 3 show that the rank-based methods RDN proposed for latent Gaussian copula model (binary and mixed) perform pretty well even when the dimensionality is larger than the sample size.

Fig. 3
figure 3

Receiver operating characteristic curves for Scenario 1 and Scenario 2 under latent Gaussian copula model, with dimensionality varying from 50 to 120. a Scenario 1. b Scenario 2

Table 2 Simulation results over 100 replications for Scenario 1 and Scenario 2

By the ROC curves in Fig. 4, we can find that RDN is more robust to the data misclassification than the benchmark estimator ZP-DDN. The robustness of RDN to outliers illustrates the advantage of the dichotomization method. In the absence of misclassification, it is seen that the ROC curves of RDN and ZR-RDN are similar, which indicates little information loss for differential network recovery due to the dichotomization procedure. Table 3 gives the true discovery rates with different different loss functions for Scenario 3 and presents the elementwise L norm estimation accuracies of the thresholded estimators tuned using the loss functions L and LF for Scenario 3. From Table 3, we can see that the LF loss function gives slightly better results than the L loss function. Besides, we can see that the elementwise L norm estimation accuracy are comparable. This is also true for Scenario 1 and Scenario 2.

Fig. 4
figure 4

Receiver operating characteristic curves for Scenario 3 under latent Gaussian copula model, with dimensionality varying from 50 to 120. The red line represents the proposed RDN method, the black dotted represents the benchmark method ZP-DDN, the blue dotted line represents DDN method. a Scenario 3, p = 50. b Scenario 3, p = 80. c Scenario 3, p = 100. d Scenario 3, p = 120

Table 3 Simulation results over 100 replications for Scenario 3

Theoretical results

The estimators \(\hat {\boldsymbol {\Delta }}\), \(\hat {\boldsymbol {\Delta }}^{\mathrm {B}}\) and \(\hat {\boldsymbol {\Delta }}^{\mathrm {M}}\), after an additional threshold step, are shown to be able to recover not only the support of the true Δ0 but also the signs of its nonzero entries as long as those entries are sufficiently large. Besides, under mild conditions, the estimation errors bounds in terms of matrix Frobenius norm and elementwise norm both achieve the parametric rate \(\sqrt {{\log p}/{\min (n_{X},n_{Y})}}\), see details in Additional file 1. It indicates that the extra modeling flexibility and robustness come at almost no cost of statistical efficiency and it seems as if the latent variable can be observed. Thus these new estimators can be used as a safe replacement of Gaussian estimators even when the data are truly Gaussian. Compared to the separate and joint approaches to estimating differential networks (e.g. [22, 23],) which require sparsity on each Σ−1, the proposed direction estimation methods for different types of data only require the sparsity of the difference matrix Δ0. The detailed theorems and proofs are in the Additional file 1 available online.

Results of application

In the real application part, we compare three estimation methods. The first method is the Gaussian copula RDN method, which we denote as C-RDN. The second method is the latent Gaussian copula RDN method, which we denote as B-RDN. In specific, we first apply the adaptive dichotomization method implemented by the ArrayBin package in R to remove the batch effect in the gene expression data. The adaptive dichotomization method transforms the numerical gene expression data into 0/1 binary data. The genes with high expression level are encoded as 1 and the genes with lower expression level are encoded as 0. Then we apply the B-RDN to the 0/1 binary data. The third method is the direct differential network estimation method proposed by [7] with Gaussian assumption, which we denote as DDN.

We conduct Shapiro-Wilk test on the gene data set and 63% of the genes reject the normality null hypothesis. Therefore, the Gaussian assumption of DDN method is violated in this real data example. Thus we expect that C-RDN which relaxes the Gaussian assumption may provide a more reliable result. The deficiency of the C-RDN method lies in that it does not take the batch effect of the genes expression data from different platforms into consideration. For the B-RDN method, it removes the batch effect.

Figure 5 depicts the differential network estimated by the three methods. Table 4 gives the hub genes selected out by different estimation methods. For method C-RDN, the tuning parameter λ is selected by the AIC criterion with the elementwise 1 norm loss function. To ensure a fair comparison, the tuning parameter λ for method B-RDN and DDN are selected such that the number of edges in the estimated differential graphs by all three methods are almost the same. The number of edges selected by the three methods are 56, 59 and 52, respectively. From Fig. 5, we can see that B-RDN identifies an obvious hub gene WIF1 that is an extracellular antagonist of WNT. WIF1 is a frequent target for epigenetic silencing in various human cancers [30]. WIF1 promoter is frequently methylated in non-small cell lung cancer (NSCLC) cells to down-regulate its mRNA expression [33]. Both C-RDN and B-RDN select out a common hub gene APC. APC expression in lung cancer are associated with survival time and is also related to cancer metastasis [34]. Both C-RDN and DDN select out a common hub gene, MAPK8, which plays a significant role in the promotion of lung inflammation and tumorigenesis subsequent to tobacco smoke exposure [35]. The expression level of DVL2 was reported significantly higher in lung adenocarcinomas than in squamous carcinomas, and was associated with poor tumor differentiation [36]. Winn et al. [37] reported that the restoration of FZD9 signaling inhibited both cell proliferation and anchorageindependent growth, promoted cellular differentiation, and reversed the transformed phenotype in NSCLC. The overexpression of MMP7 was associated with tumor proliferation, and a poor prognosis in NSCLC [38]. RAC1 generally plays an important role in cancer progression and metastasis [39].

Fig. 5
figure 5

Differential network estimated by different methods. Orange edges show an increase in conditional dependency from control group to lung cancer patient group; grey edges show a decrease. Red points stand for hub genes which have edges with more than 3 other genes. a C-RDN. b B-RDN. c DDN

Table 4 Hub genes selected by different methods

By comparing (a) and (b) in Fig. 5, we can see that the estimated differential network can be very different with/without considering the batch effect. Although it is inevitable to result in information loss in the discretization procedure for method B-RDN, [40] argued that this procedure can potentially improve the accuracy of the statistical analysis. In real data example, we recommend to use the B-RDN method to remove the batch effect despite the little information loss. At last we argue that statistical comparison of group difference in this biological network or pathway can provide new insight into the underlying lung cancer mechanism, which may further offer more effective targets for drug development.

To further interpret the underlying biological implications of the identified hub genes, we conducted Gene Ontology (GO) enrichment analysis. Table 5 shows the common GO terms enriched by C-RDN, B-RDN and DDN. The GO enrichment analysis is performed using R package “clusterProfiler” with the P-value adjusted by Benjamini-Hochberg method. It shows that our methods (C-RDN, B-RDN) have smaller P-value than DDN. The common molecular function and cellular component suggest that the change of frizzled binding, Wnt-protein binding and beta-catenin destruction complex are important in the etiology of lung cancer. These predictions are supported by the literatures [4143], which indicates that the proposed differential network model can provide biological meaningful underlying signals.

Table 5 Gene Ontology (GO) enrichment analysis result

Discussion

A complex disease phenotype (e.g. diabetes, cancer) often reflects various pathobiological processes that interact in a network rather than the abnormality of a single gene. Such interactions are not static processes, instead they are dynamic in response to changing genetic, epigenetic and environmental factors, which further entails the analysis of differential network. In this paper, we propose adaptive estimation approaches for latent variable differential network model with the assumption that the true differential network is sparse, which do not require precision matrices to be sparse. The latent variable differential network model is fundamentally different from the existing ones in the literature in the sense that the differential structure in the unobserved latent variables are of primary interest. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modelling flexibility comes at almost no cost of statistical efficiency. The unified latent variable differential network model provides deeper understanding of the unknown genomic mechanism than that among the observed variables.

The current work could be extended in the following two aspects. First, in this paper, we consider the following optimization problem to directly estimate the difference matrix Δ:

$$\begin{array}{ll} \text{arg\ min} |\boldsymbol{\Delta}|_{1}, \ \ \ \text{subject to} \\ \left|\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}-\hat{\boldsymbol{S}}^{X}+\hat{\boldsymbol{S}}^{Y}\right|_{\infty}\leq \lambda_{n}, \end{array} $$

where \(\hat {\boldsymbol {S}}^{X}\) and \(\hat {\boldsymbol {S}}^{Y}\) denote the rank-based estimators of the covariance matrices. The D-trace loss function [15, 44] can also be applied to to directly estimate the precision matrix difference. Thus, we may also consider the D-trace loss function to estimate the Gaussian copula and latent Gaussian copula differential graphical models. In specific, the difference matrix Δ could be eatimated by:

$$ \text{arg\ min}_{\boldsymbol{\Delta}}\frac{1}{2}\text{Tr} \left(\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{X}\boldsymbol{\Delta}\hat{\boldsymbol{S}}^{Y}\right) - \text{Tr}\left(\boldsymbol{\Delta}\left(\hat{\boldsymbol{S}}^{X}-\hat{\boldsymbol{S}}^{Y}\right)\right) +\mathcal{G}_{\lambda}(\boldsymbol{\Delta}), $$

where λ>0 is a regularization parameter and \(\mathcal {G}_{\lambda }\) is a decomposable non-convex penalty function which has the form \(\mathcal {G}_{\lambda }=\sum _{j,k}g_{\lambda }\left (\Delta _{jk}\right)\), such as smoothly clipped absolute deviation (SCAD) penalty [45]. The theoretical guarantees are still needed to be investigated, but we expect that the empirical performance could be comparable.

Second, for the latent Gaussian copula differential graphical model, we focus on the binary data. In fact, the methods can be extended to the discrete data with more than two categories. The properties of this procedure are left for future investigation as there are a lot of work still needed to be done.

Conclusions

The proposed latent variable differential network models are very flexible and provide deeper understanding of the unknown biological mechanism. It is demonstrated latent differential network models enjoy great advantages over existing models and thus are highly recommended in real application.

Abbreviations

CNV:

Copy Number Variation

FDR:

False Discovery Rate

GO:

Gene Ontology

ROC:

Receiver Operating Characteristic

SNP:

Single Nucleotide Polymorphisms

TDR:

True Discovery Rate

TNR:

True Negative Rate

TPR:

True Positive Rate

References

  1. Li H, Gui J. Gradient directed regularization for sparse gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics. 2006; 7(2):302–17.

    Article  PubMed  Google Scholar 

  2. Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005; 37:38–45.

    Article  CAS  Google Scholar 

  3. Peng J, Wang P, Zhou N, Zhu J. Partial correlation estimation by joint sparse regression models. J Am Stat Assoc. 2009; 104(486):735.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Cai T, Li H, Liu W, Xie J. Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika. 2013; 100(1):139–56.

    Article  PubMed  Google Scholar 

  5. Fuente ADL. From differential expression to differential networking–identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010; 26(7):326–33.

    Article  PubMed  CAS  Google Scholar 

  6. Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol. 2012; 8(1):565.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Zhao SD, Cai T, Li H. Direct estimation of differential networks. Biometrika. 2014; 101(2):253–68.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Tian D, Gu Q, Jian M. Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res. 2016; 44(17):140.

    Article  CAS  Google Scholar 

  9. Xia Y, Cai T, Cai T. Testing differential networks with applications to detecting gene-by-gene interactions. Biometrika. 2015; 102(2):247–66.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Ji J, Yuan Z, Zhang X, Li F, Xu J, Liu Y, Li H, Wang J, Xue F. Detection for pathway effect contributing to disease in systems epidemiology with a case-control design. Bmj Open. 2015; 5(1):006721.

    Article  Google Scholar 

  11. Ji J, Yuan Z, Zhang X, Xue F. A powerful score-based statistical test for group difference in weighted biological networks. BMC Bioinformatics. 2016; 17(1):86.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Yuan Z, Ji J, Zhang T, Liu Y, Zhang X, Chen W, Xue F. A novel chi-square statistic for detecting group differences between pathways in systems epidemiology. Stat Med. 2016; 35(29):5512–24.

    Article  PubMed  Google Scholar 

  13. Yuan Z, Ji J, Zhang X, Xu J, Ma D, Xue F. A powerful weighted statistic for detecting group differences of directed biological networks. Sci Rep. 2016; 6:34159.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Liu W. Structural similarity and difference testing on multiple sparse gaussian graphical models. Ann Stat. 2017; 45(6):2680–2707.

    Article  Google Scholar 

  15. Yuan H, Xi R, Chen C, Deng M. Differential network analysis via the lasso penalized d-trace loss. Biometrika. 2017; 104:755–70.

    Article  Google Scholar 

  16. He Y, Zhang X, Ji J, Liu B. Joint estimation of multiple high-dimensional gaussian copula graphical models. Aust N Z J Stat. 2017; 59:289–310.

    Article  Google Scholar 

  17. Ji J, He D, Feng Y, He Y, Xue F, Xie L. Jdinac: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics. 2017; 33(19):3080–87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006; 34:1436–62.

    Article  Google Scholar 

  19. Yuan M, Lin Y. Model selection and estimation in the gaussian graphical model. Biometrika. 2007; 94(1):19–35.

    Article  Google Scholar 

  20. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008; 9(3):432–41.

    Article  PubMed  Google Scholar 

  21. Yuan M. High dimensional inverse covariance matrix estimation via linear programming. J Mach Learn Res. 2010; 11(12):2261–86.

    Google Scholar 

  22. Cai T, Liu W, Luo X. A constrained 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc. 2011; 106(494):594–607.

    Article  CAS  Google Scholar 

  23. Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011; 98(1):1–15.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Danaher P, Wang P, Witten DM. J R Stat Soc Ser B (Stat Methodol). 2014; 76(2):373–97.

  25. Liu H, Lafferty J, Wasserman L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res. 2009; 10(3):2295–328.

    Google Scholar 

  26. Liu H, Han F, Yuan M, Lafferty J, Wasserman L. High-dimensional semiparametric gaussian copula graphical models. Ann Stat. 2012; 40(4):2293–326.

    Article  Google Scholar 

  27. Xue L, Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat. 2012; 40(5):2541–71.

    Article  Google Scholar 

  28. He Y, Zhang X, Wang P, Zhang L. High dimensional Gaussian copula graphical model with FDR control. Comput Stat Data Anal. 2017; 113:457–74.

    Article  Google Scholar 

  29. Fan J, Liu H, Ning Y, Zou H. High dimensional semiparametric latent graphical model for mixed data. J R Stat Soc. 2017; 79(2):405–21.

    Article  Google Scholar 

  30. Ying Y, Tao Q. Epigenetic disruption of the wnt/ß-catenin signaling pathway in human cancers. Epigenetics. 2009; 4(5):307–12.

    Article  CAS  PubMed  Google Scholar 

  31. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2011; 3(1):1–122.

    Article  Google Scholar 

  32. Mazieres J, He B, You L, Xu Z, Jablons DM. Wnt signaling in lung cancer. Cancer Lett. 2005; 222(1):1–10.

    Article  CAS  PubMed  Google Scholar 

  33. Lee SM, Park J, Kim DS. Wif1 hypermethylation as unfavorable prognosis of non-small cell lung cancers with egfr mutation. Mol Cells. 2013; 36(1):69–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Brabender J, Usadel H, Danenberg KD, Metzger R, Schneider PM, Lord RV, Wickramasinghe K, Lum CE, Park J, Salonga D, et al. Adenomatous polyposis coli gene promoter hypermethylation in non-small cell lung cancer is associated with survival. Oncogene. 2001; 20(27):3528–32.

    Article  CAS  PubMed  Google Scholar 

  35. Takahashi H, Ogata H, Nishigaki R, Broide DH, Karin M. Tobacco smoke promotes lung tumorigenesis by triggering ikkbeta- and jnk1-dependent inflammation. Cancer Cell. 2010; 17(1):89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wei Q, Zhao Y, Yang ZQ, Dong QZ, Dong XJ, Han Y, Zhao C, Wang EH. Dishevelled family proteins are expressed in non-small cell lung cancer and function differentially on tumor progression. Lung Cancer. 2008; 62(2):181–92.

    Article  PubMed  Google Scholar 

  37. Winn RA, Marek L, Han SY, Rodriguez K, Rodriguez N, Hammond M, Scoyk MV, Acosta H, Mirus J, Barry N. Restoration of wnt-7a expression reverses non-small cell lung cancer cellular transformation through frizzled-9-mediated growth inhibition and promotion of cell differentiation. J Biol Chem. 2005; 280(20):19625.

    Article  CAS  PubMed  Google Scholar 

  38. Liu D, Nakano J, Ishikawa S, Yokomise H, Ueno M, Kadota K, Urushihara M, Huang CL. Overexpression of matrix metalloproteinase-7 (mmp-7) correlates with tumor proliferation, and a poor prognosis in non-small cell lung cancer. Lung Cancer. 2007; 58(3):384–91.

    Article  PubMed  Google Scholar 

  39. Kaneto N, Yokoyama S, Hayakawa Y, Kato S, Sakurai H, Saiki I. Rac1 inhibition as a therapeutic target for gefitinib-resistant non-small-cell lung cancer. Cancer Sci. 2014; 105(7):788–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. McCall MN, Irizarry RA. Thawing frozen robust multi-array analysis (frma). BMC Bioinformatics. 2011; 12(1):1.

    Article  Google Scholar 

  41. Stewart DJ. Wnt signaling pathway in non-small cell lung cancer. J Natl Cancer Inst. 2014; 106(1):356.

    Article  CAS  Google Scholar 

  42. Nakayama S, Sng N, Carretero J, Welner R, Hayashi Y, Yamamoto M, Tan AJ, Yamaguchi N, Yasuda H, Li D. β-catenin contributes to lung tumor development induced by egfr mutations. Cancer Res. 2014; 74(20):5891–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Rapp J, Jaromi L, Kvell K, Miskei G, Pongracz JE. Wnt signaling-lung cancer is no exception. Respir Res. 2017; 18(1):167.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Wadsworth JL, Tawn JA. Sparse precision matrix estimation via lasso penalized d-trace loss. Biometrika. 2014; 1(1):103–20.

    Google Scholar 

  45. Fan J, Li R. Variable selection via nonconvave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96(456):1348–60.

    Article  Google Scholar 

Download references

Funding

This work was supported by grants from the National Natural Science Foundation of China (grant number 81803336, 11801316, 11571080 and 81573259) and Natural Science Foundation of Shandong Province (ZR2018BH033). Publication of this article was sponsored by 81803336 grant. The funding body played no role in the design, writing or decision to publish this manuscript.

Availability of data and materials

The gene expression data set related to lung cancer is publicly available from the Gene Expression Omnibus at accession number GDS2771.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 19 Supplement 17, 2018: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM) 2018: bioinformatics. The full contents of the supplement are available online at https://0-bmcbioinformatics-biomedcentral-com.brum.beds.ac.uk/articles/supplements/volume-19-supplement-17.

Author information

Authors and Affiliations

Authors

Contributions

YH and JJ contributed to the study design, analytical preparation and the writing of the manuscript. YH and JJ performed the simulation studies. JJ analyzed the data, LX, XZ and FX wrote and revised the manuscript. All authors read and approved this version of the manuscript.

Corresponding author

Correspondence to Jiadong Ji.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1

Contains the theoretical guarantee of of the proposed methods and proofs. (PDF 284 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Y., Ji, J., Xie, L. et al. A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 19 (Suppl 17), 493 (2018). https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-018-2461-2

Download citation

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-018-2461-2

Keywords