Skip to main content

Stability of building gene regulatory networks with sparse autoregressive models

Abstract

Background

Biological networks are constantly subjected to random perturbations, and efficient feedback and compensatory mechanisms exist to maintain their stability. There is an increased interest in building gene regulatory networks (GRNs) from temporal gene expression data because of their numerous applications in life sciences. However, because of the limited number of time points at which gene expressions can be gathered in practice, computational techniques of building GRN often lead to inaccuracies and instabilities. This paper investigates the stability of sparse auto-regressive models of building GRN from gene expression data.

Results

Criteria for evaluating the stability of estimating GRN structure are proposed. Thereby, stability of multivariate vector autoregressive (MVAR) methods - ridge, lasso, and elastic-net - of building GRN were studied by simulating temporal gene expression datasets on scale-free topologies as well as on real data gathered over Hela cell-cycle. Effects of the number of time points on the stability of constructing GRN are investigated. When the number of time points are relatively low compared to the size of network, both accuracy and stability are adversely affected. At least, the number of time points equal to the number of genes in the network are needed to achieve decent accuracy and stability of the networks. Our results on synthetic data indicate that the stability of lasso and elastic-net MVAR methods are comparable, and their accuracies are much higher than the ridge MVAR. As the size of the network grows, the number of time points required to achieve acceptable accuracy and stability are much less relative to the number of genes in the network. The effects of false negatives are easier to improve by increasing the number time points than those due to false positives. Application to HeLa cell-cycle gene expression dataset shows that biologically stable GRN can be obtained by introducing perturbations to the data.

Conclusions

Accuracy and stability of building GRN are crucial for investigation of gene regulations. Sparse MVAR techniques such as lasso and elastic-net provide accurate and stable methods for building even GRN of small size. The effect of false negatives is corrected much easier with the increased number of time points than those due to false positives. With real data, we demonstrate how stable networks can be derived by introducing random perturbation to data.

Background

Biological networks are constantly perturbed randomly and there exist efficient and compensatory mechanisms to withstand such instabilities. Constructing gene regulatory networks (GRN) from time-series gene expression data plays a vital role in understanding complex biological mechanisms and in the development of novel drugs. Though microarrays allow measurement of thousands of genes simultaneous, gene expressions in practice can be gathered only over a few time points due to high cost and time involved, and limitations of the experiments. This makes building GRN inherently an ill-posed problem in practice, leading such networks unstable and irreproducible. Moreover, variable and complex nature of biological sources, and measurement noise and artifacts, add to the challenges of constructing accurate and stable GRN.

A wide range of techniques for inferring GRN from microarray datasets have been proposed in the literature [14], including Boolean networks [5], linear networks [6, 7] differential equation models [8], and stochastic methods [912]. Despite the plethora of such techniques, stability of building GRN from gene expression data has not been addressed by the research community so far. In this paper, we investigate the stability of building GRN by using sparse linear models with simulations and real data.

The linear multivariate vector autoregression (MVAR) provides a simple and efficient technique to estimate regulatory relationships among genes. However, due to less number of time points compared to the number of genes whose expressions involved in gene expression datasets, several penalized MVAR techniques using regularization [3, 7, 1316] and priors [17] have been considered for building GRN. The sparse MVAR techniques such as lasso and elastic-net uses a penalty function to drive small regulatory coefficient to zero, thereby producing computationally stable and sparse yet biologically plausible GRNs. Recently, their efficacy in building GRN have been demonstrated [6, 7].

A good technique of building GRN should not only be accurate but also be reproducible and stable; biologists will then be able to test complex hypothesis with confidence on in silico networks. Stability means that the network construction is robust to changes of network topology and parameters, and biological and instrumental noise. In this paper, we first introduce novel criteria for evaluating stability of building GRN at the level of connections and networks. MVAR models of ridge, lasso, and elastic-net penalty are evaluated with respect to their accuracies and stabilities by using synthetic gene expression datasets. In particular, we investigate how many time points of gene expressions are needed for a network of given topology and size. Using a real data set gathered in HeLa cell cycle [18], we demonstrate how random perturbations in the data could be induced to determine stable genes in the network.

Methods

Suppose dataset X = {x i (t)}I × T consists of time-series of expressions of I genes, taken over equally spaced T time points, where x i (t) denotes the expression of gene i at time t. Let vector represent expressions of all genes at time t. Consider a network of I genes, represented by an r-order multivariate vector autoregressive (MVAR) model:

(1)

where denotes the matrix of regression coefficients corresponding to a model of order τ, and denotes residuals that are assumed to be i.i.d. and zero mean Gaussian. The regression coefficient represents the interaction between genes i and j. In general, the model has I2r coefficients. This scope of this work is restricted to first-order systems (i.e., r = 1). Suppose a vector of gene expressions at time t is denoted by row vector and let zt = yt–1 denote the vector of gene expressions at the previous time-point, β = [β1, β2, … βI]T a matrix of size I × I of regression coefficients, and εt = [ε1(t), ε2(t),…ε I (t)] the corresponding innovations. The multivariate model can be written in standard multivariate vector autoregressive (MVAR) form:

(2)

In vector form, (2) becomes:

(3)

where t-th row of Y, Z, and E, are yt, zt, and εt, respectively, and there are T – 1 samples; Y is a (T – 1) × I matrix, Z a (T – 1) × I matrix, β a I × I matrix, and E a (T – 1) × I matrix. MVAR coefficients are estimated using standard least squares as:

(4)

The above matrix cannot be determined if the number of genes are larger than number of samples (that is, T <<I).

To handle such instances, penalized regression methods such as ridge regression, lasso, or elastic net regression penalties have been proposed [19]. The general penalized regression loss function is given by:

(5)

where λ and α are regularization parameters. When α = 0 the loss function represents ridge MVAR, α = 1 represents lasso, and α (0,1) represents an elastic-net penalty. In other words, lasso uses L1-norm, ridge regression uses L2-norm, and elastic-net uses both L1-norm and L2-norm in its penalty term. The penalty terms attempt to drive small regression coefficients to zero, making networks sparse and computationally well-posed. The lasso penalty is more capable of driving regression coefficients to zero than the ridge penalty.

Synthetic data

Most of the real world networks, such as biological networks, are scale-free. In this study, GRN of various sizes and topology are built with scale-free networks using Barabasi-Albert model [20]. First, assuming that large GRNs adhere to the topology of scale-free networks, synthetic structures of GRN were built by initiating a small number of nodes. New nodes and edges were added with preferential attachment as the probability of addition of new nodes to the existing node is not uniform. A node with high number of edges attracts higher number of new nodes compared to a node with no connection. This phenonmena in fact leads to power-law distribution: the probability p i of preferential attachment of a new gene to existing gene i is given by:

(6)

where di denotes the number of adjacent edges not initiated by gene i itself (which approximates to the in-degree of gene i). The parameters γ denotes the power of preferential attachment and b the attractiveness of the gene with no adjacent edges.

Time-series datasets of gene expressions were generated for a specific network topology with the preferential attachment γ = 1.2. For a given network topology, the coefficients corresponding to no interactions among genes were set to zero. For other connections, true MVAR coefficients were obtained by drawing samples from a uniform distribution on the interval [-1,-0.8] and [0.8,1] such that the number of positive and negative coefficients are approximately equal. The initial (t = 0) gene expression values were drawn from a uniform distribution on the interval [10,15]. For successive time points, expression were generated using MVAR model with added i.i.d. Gaussian random noise Σ = I[7]. Parameters used for generating synthetic networks are given in Table 1. For a given network and number of time points, we generated 100 time-series datasets by randomly initializing the gene expressions.

Table 1 Parameters used for generation of synthetic network

Real data

To investigate to the stability of real biological networks, we use lasso and elastic-net methods on Human cancer cell Line (HeLa) cell-cycle gene expression dataset [18]. The dataset used in this study is described as experiment 3 (http://genome-www.stanford.edu/Human-CellCycle/Hela/) and contains 48 time points where gene expressions were measured at the interval of one hour and synchronized by double thymidine block. Based on relevance to cell cycle and tumor development, 91 genes were selected. This dataset was previously used to demonstrate the efficacy of sparse lasso MVAR techniques in building GRN [6]. In order to investigate stability random perturbations, we added noise perturbations at each time point by adding random samples from Gaussian noise N(0, (σ + δ)2); we define where σ is the standard deviation of residuals of real data and δ is a perturbation constant. For real dataset, the standard deviation of residuals was, σ = 0.225 and σ = 0.23, respectively, for models built with lasso and elastic-net MVAR models. By generating 100 gene expression datasets for each of δ such that SNR [0.01, 4], the stability of GRN built were studied against noise perturbations.

Stability of structure

In this section, we present a Hamming distance based criteria for evaluating the stability of building GRN. Let be a set of B sub-samples of original data and sb be the GRN derived from b-th sample Xb. The structure of each network sb is represented by a connectivity matrix where represents the presence of a regulatory interaction between genes i and j, and otherwise 0. Consider two GRNs sb and ; the similarity of the two networks is obtained by the average Hamming distance over all the regulatory connections:

(7)

where d denotes the Hamming distance and |sb| denotes the number of connections in the network sb. In (7), the stability ρ [0, 1], where higher values denote high stability of the GRN inference algorithm. Here, stability is measured with respect to the whole network and the Hamming distance takes into account both the presence and absence of a regulatory connection.

Using the above measures, the stability of complete GRNs is obtained by averaging over B number of structures obtained using a particular method. The average of pair-wise stability, such as denoted in (7), is then used as the overall stability performance of the given method. This is given by:

(8)

The stability of an independent edge is equally important. With a small perturbation, an edge is either established between two genes i and j, or is not detected. Hence, based on the number of times a regulatory relationship occurs between the two specific genes, the stability of the edge can be computed. Such edge stability is defined as:

(9)

Implementation

Scale-free network topologies were generated using igraph R package [21]. The stability of networks are studied with penalized regression methods. Each of these MVAR method has regularization parameters. Although computationally expensive, we used the leave-one-out cross-validation to estimate the λ for ridge MVAR and α and/or λ for sparse models. For ridge estimate, the λ was selected from the set of {0.001, 0.1, 1, 10, 100}. For lasso and elastic net, the glmnet package [22] is used which could generate the whole solution path for λ for a given α value (α = 1 for lasso). Hence, the α values are selected from a set of {0.1, 0.2, …, 0.9} for elastic-net model.

The presence of statistically significant edges were determined using t- distribution over regression coefficients corrected for multiple comparisons using false discovery rate (FDR) [23]. In this method, let P(1)P(2) ≤ … ≤ P(n) denotes sorted p-values where n is the number of hypotheses. If k* is the largest n* for which then we reject all the hypotheses n* = 1, 2,…, k*. Here, q is level of significance. For a given regression, the p-value for a regression coefficient β could be approximated using:

(10)

Here, t(d) is t-distribution with d degree of freedom (dof); σ2 is an estimate of the error of variance ; and w k is the j-th diagonal element of ((Z′Z + λW)–1Z′Z(Z′Z + λW)–1) where W denotes generalized inverse of W, a diagonal matrix with diagonal elements |β k |. To achieve closed form solution, this matrix is estimated using ridge coefficients using the tuned λ for lasso penalty [24]. For lasso, the number of nonzero coefficients is an unbiased estimate for the degrees of freedom [25]. The dof for elastic-net were also assumed to be the number of nonzero coefficients. Only the statistically significant edges were evaluated for stability analysis of sparse MVAR on real datasets.

In this paper, we use fast coordinate descent algorithm for optimizing lasso and elastic net loss function. The codes are provided with glmnet MATLAB package. Ridge regression is performed using standard MATLAB function.

Results on synthetic dataset

To study the effects of the number of time points on the stability, GRN consisting of I {10, 50, 100} number of genes were simulated using scale-free networks with topologies having power-law coefficient γ [ 2.10, 2.30 ]. Thus, temporal gene expressions for T {10, 30, 50, 70} number of time points for a given network topology were generated. For given I, γ, and T, B = 100 number of bootstrapped datasets were generated by randomly initiating gene expressions at the first time point.

Table 2 shows accuracies and stabilities of sparse linear models at different numbers of genes and time points. True positives (TP), false positives (FP), and F-measure (which is equal to , where Precision = TP/(TP+FP) and Recall = TP/(TP + FN) were used to indicate the accuracy. The performances measures reported are the averages over 100 bootstrap datasets. As seen, the accuracies of elastic-net and lasso were good and they need only a number of time points approximately equaling the number of genes to achieve good accuracy in small networks (for example, 10 genes). But as the network size grows, the number of time points required were relatively less compared to the size of the network. Ridge shows relatively poor accuracy, so is not considered further for evaluation. If the number of time points are small, false negatives were easier to correct by increasing the number of time points than false positives.

Table 2 Performances of lasso, elastic-net (EN), and ridge MVAR models on synthetic datasets with varying numbers of genes and time points.

Figure 1 shows the stability of lasso and elastic-net MVAR methods of building GRN of different sizes with different number of time points. The accuracy and stability were similar for the two methods. To achieve decent accuracy, the number of time points needed were at least about the size of the network. But to achieve good stability, much more time points are needed. At the individual edge level, figure 2 and 3 represent distribution of stability of edges with increasing number of genes and time points. Increase in number of time samples reduces the number of edges of low stability.

Figure 1
figure 1

Stability of networks. Illustration of the effects of the number of time points on accuracy (F-measure) and stability of sparse MVAR models of building networks of varying number of genes. Figures (a) and (b) denote accuracy and stability of elastic-net penalty respectively while Figure (c) and (d) show accuracy and stability with Lasso penalty respectively.

Figure 2
figure 2

Distribution of stability of edges. Distributions of stability of edges with increase in time points on a 50 gene network. Figure (a), (b), (c), and (d) show distribution with 10, 30, 50, and 70 time points respectively.

Figure 3
figure 3

Distribution of stability of edges. Distribution of stability of edges with increase in number of genes with 30 time points. Figure (a) and (b) show distribution with 10 and 100 genes respectively.

Results on real dataset

Figures 4a and 5a show statistically significant connections of GRN obtained using elastic-net and lasso MVAR, respectively, for Hela cell-cycle dataset. Figures 4b and 5b show the stable networks whose edges that have stability values greater than 0.5 after perturbations at SNR=4.0. As seen, the network derived using elastic-net penalty has higher number of edges compared to GRNs derived using lasso. This is because lasso can only select a very few number of edges in the presence of correlated gene expressions. As seen, after perturbation, the number of in-degree or out-degree of many critical hubs are reduced while few new edges are also detected. Elastic-net produced a more stable network than the lasso.

Figure 4
figure 4

Gene networks obtained using elastic-net penalty. GRNs obtained using elastic-net penalty: networks showing (a) all the edges that are statistically significant, and (b) all the edges that have stability greater than 0.5 for SNR = 4.

Figure 5
figure 5

Gene networks obtained using lasso penalty. GRNs obtained using lasso penalty:networks showing (a) all the edges that are statistically significant, and (b) all the edges that have stability greater than 0.5 for SNR = 4.

Both the networks were able to detect many important biological hubs, such as Noxa, IL-6, c-myc, IAP, TSP1 etc. The biological importance of these hubs in cell cycle regulation and tumor developments are well documented in various studies. Briefly, Noxa mediates cell cycle control of homeostasis of B cells and by repressing Noxa, induction of G1 arrest by p18 bypasses a homeostatic cell-cycle checkpoint in intermediate Plasma cells for their differentiation [26]. IL-6 plays importance role in induction of apoptosis and cell cycle regulation [27]. This is also shown by large number of out-degree edges. Thrombospondin-1 (TSP1) curtails tumor growth and acts as an inhibitor of angiogenesis [28]. Inhibitors of apoptosis (IAPs) have important role in cell division and regulates apoptosis [29]. c-Myc oncoprotein prevents cell cycle arrest in response to growth-inhibitory signals, differentiation stimuli, or mitogen withdrawal [30]. Hubs such as P53, NFkB, FGFR3, etc. were not detected in GRN obtained using Lasso penalty, but were detected with elastic-net penalty. However, many edges and hubs are not recognized when perturbations were induced (for example, at SNR = 4.0) and edge stability is set to a thresold of 0.5. Genes like Noxa, NFkB, P53 etc. are severely affected in GRNs obtained using both sparse MVAR methods. Biologically, this could mean that these genes are less important for the biological processes underlying the network, compared to more stable genes. The results indicate that random perturbation of data indirectly helps the process of building accurate and stable GRNs. Figure 6 shows stability of individual connections produced by lasso and elastic-net methods. As seen, lasso produces less number of unstable edges than elastic-net. Figure 7 shows the effect of SNR on stability and F-measure. With increase in SNR level, both the stability and F-measures improves. Generally, the GRNs obtained using elastic-net penalty have higher stability and F-measure compared to GRNs derived using lasso. To compute the F-measure, it is assumed that the GRN derived on real data using lasso (or elastic-net) MVAR is true.

Figure 6
figure 6

Distribution of stability of edges on real dataset. Distribution of stability of edges over perturbed real dataset with SNR = 4 perturbation level

Figure 7
figure 7

Stability and F-measure of GRNs derived using puerturbed real dataset. Figure (a) shows the effects of perturbation on stability while Figure (b) depicts such effect on F-measure of GRNs derived using lasso and elastic-net MVAR on real data collected over Hela cell-cycle.

Discussion

Novel measures of evaluating stability of building GRN were introduced. Thereby, stability and accuracy of sparse MVAR models in building GRN structure were studied using synthetic datasets. The results suggest that to achieve decent accuracy and stability with sparse MVAR methods, at least a number of time points equal to the number of genes are required. But as the network size grows, the number of time points required is less compared to the number of genes in the network. It is easier to ameliorate the effects of false negatives than the false positives by increasing the number of time samples of gene expressions. The results indicates that lasso MVAR and elastic-net perform equally on datasets in general, though lasso handles false positives better. However, elastic-net performed better than lasso on real dataset. This is because elastic-net penalty has an ability to predict regulatory relationship between highly correlated genes while lasso will only predict one of them [31]. As genes are highly correlated, elastic-net or their improved versions proves better making inferences on large scale GRNs [7]. In simulated datasets, correlations among gene expression were not simulated.

To the best of our knowledge, this is the first study introducing stability criteria or studying methods building GRN. Our investigations here were focused on the stability of sparse MVAR models. Our work could be extended for other approaches as well. For small networks, sparse linear models to build networks were stable and accurate. Furthermore, effects of the number of time points on the stability were studied by experimenting on scale-free networks of different topologies. As the network size grows, the number of time points required for building GRN is less than the number of genes in the GRN. Stability is inherent problem in practice as real datasets consist of a large number of genes and short time-series. With an application of real dataset, we demonstrate how stable GRN can be derived by introducing perturbations to the gene expression datasets. Only a few statistically significant edges and associated gene hubs are stable and can withstand small amount of perturbation. Biologically, more stable genes have preserve more significant roles of the biological process of the network. This research emphasizes the need for building GRN that are accurate, stable, and reproducible, so that the structures derived are robust against noise and perturbation of data. In addition by perturbing gene expressions, more accurate and stable core genes and subnetworks can be inferred from temporal gene expression data.

References

  1. Huang Y, Tienda-Luna I, Wang Y: Reverse engineering gene regulatory networks. IEEE Signal Processing Magazine 2009, 26: 76–97.

    Article  PubMed Central  PubMed  Google Scholar 

  2. Chao S, Hua J, Jung S: Inference of gene regulatory networks using time-series data: a survey. A Survey. Current Genomics 2009, 10: 416–429.

    Article  Google Scholar 

  3. Fogelberg C, Palade V: Machine learning and genetic regulatory networks: a review and a roadmap. In Foundations of Computational Intelligence. Edited by: Hassanien AE, Abraham A, Vasilakos A, Pedrycz W. Stoneham: Butterworth-Heinemann, Springer Verlag; 2009.

    Google Scholar 

  4. Kim S, Imoto S, Miyano S: Inferring gene networks from time series microarray data using dynamic Bayesian networks. Briefings in Bioinformatics 2003, 4(3):228–235. 10.1093/bib/4.3.228

    Article  CAS  PubMed  Google Scholar 

  5. Li P, Zhang C, Perkins E, Gong P, Deng Y: Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks. BMC Bioinformatics 2007, 8(Suppl 7):S13. 10.1186/1471-2105-8-S7-S13

    Article  PubMed Central  PubMed  Google Scholar 

  6. Fujita A, Sato J, Garay-Malpartida H, Yamaguchi R, Miyano S, Sogayar M, Ferreira C: Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Systems Biology 2007, 1: 39. 10.1186/1752-0509-1-39

    Article  PubMed Central  PubMed  Google Scholar 

  7. Shimamura T, Imoto S, Yamaguchi R, Fujita A, Nagasaki M, Miyano S: Recursive regularization for inferring gene networks from time-course gene expression profiles. BMC Systems Biology 2009, 3: 41. 10.1186/1752-0509-3-41

    Article  PubMed Central  PubMed  Google Scholar 

  8. de Jong H: Modeling and simulation of genetic regulatory systems : a literature review. Journal of Computational Biology 2002, 9: 67–103. 10.1089/10665270252833208

    Article  CAS  PubMed  Google Scholar 

  9. Friedman N, Linia M, Nachman I, Peér D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000, 7(3–4):601–620. 10.1089/106652700750050961

    Article  CAS  PubMed  Google Scholar 

  10. Ferrazzi F, Sebastiani P, Ramoni M, Bellazzi R: Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks. BMC Bioinformatics 2007, 8(Suppl 5):S2. 10.1186/1471-2105-8-S5-S2

    Article  PubMed Central  PubMed  Google Scholar 

  11. Murphy K, Mian S: Modelling gene expression data using dynamic bayesian networks. Tech. rep 1999.

    Google Scholar 

  12. Chaturvedi I, Rajapakse JC: Building gene networks with time-delayed regulations. Pattern Recognition Letters 2010, 31: 2133–2137. 10.1016/j.patrec.2010.03.002

    Article  Google Scholar 

  13. Zhao W, Serpedin E, Dougherty E: Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics 2006, 22(17):2129–2135. 10.1093/bioinformatics/btl364

    Article  CAS  PubMed  Google Scholar 

  14. Menéndez P, Kourmpetis Y, ter Braak C, van Eeuwijk F: Gene regulatory networks from multifactorial perturbations using Graphical Lasso: application to the DREAM4 Challenge. PLOS One 2010, 5(12):e14147. 10.1371/journal.pone.0014147

    Article  PubMed Central  PubMed  Google Scholar 

  15. Grzegorczyk M, Husmeier D: Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes. Bioinformatics 2011, 27(5):693–699. 10.1093/bioinformatics/btq711

    Article  CAS  PubMed  Google Scholar 

  16. Fujita A, Sato J, Garay-Malpartida H, Morettin P, Sogayar M, Ferreira C: Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method. Bioinformatics 2007, 23(13):1623–1630. 10.1093/bioinformatics/btm151

    Article  CAS  PubMed  Google Scholar 

  17. Geier F, Timmer J, Fleck C: Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge. BMC Systems Biology 2007, 1: 11. 10.1186/1752-0509-1-11

    Article  PubMed Central  PubMed  Google Scholar 

  18. Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P, Botstein D: Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Molecular Biology of the Cell 2002, 13(6):1977–2000. 10.1091/mbc.02-02-0030.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer; 2001.

    Chapter  Google Scholar 

  20. Barabasi A, Albert R: Emergence of scaling in random networks. Science 1999, 286: 509–512. 10.1126/science.286.5439.509

    Article  PubMed  Google Scholar 

  21. Csardi G, Nepusz T: igraph: Network analysis and visualization.[http://cran.r-project.org/web/packages/igraph/index.html]

  22. Friedman J, Hastie T, Tibshirani R: glmnet: Lasso and elastic-net regularized generalized linear models.[http://cran.r-project.org/web/packages/glmnet/index.html]

  23. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of Royal Statistical Society: Series B 1995, 57: 289–300.

    Google Scholar 

  24. Tibshirani R: Regression shrinkage and selection vias the Lasso. Journal of Royal Statistical Society: Series B 1996, 58: 267–288.

    Google Scholar 

  25. Zou H, Hastie T, Tibshirani R: On the degrees of freedom of the lasso. Annals of Statistics 2007, 35(5):2173–2192. 10.1214/009053607000000127

    Article  Google Scholar 

  26. Bretz J, Garcia J, Huang X, Kang L, Zhang Y, Toellner KM, Chen-Kiang S: Noxa mediates p18INK4c cell-cycle control of homeostasis in B cells and plasma cell precursors. Blood 2011, 117(7):2179–2188. 10.1182/blood-2010-06-288027

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  27. Petrucci M, Ricciardi M, Ariola C, Gregorj C, Ribersani M, Savino R, Ciliberto G, Tafuri A: Cell cycle regulation and induction of apoptosis by IL-6 variants on the multiple myeloma cell line XG-1. Annals of Hematology 1999, 78: 13–18. 10.1007/s002770050465

    Article  CAS  PubMed  Google Scholar 

  28. Lawler J: Thrombospondin-1 as an endogenous inhibitor of angiogenesis and tumor growth. J. cellular and molecular medicine 2002, 6: 1–12. 10.1111/j.1582-4934.2002.tb00307.x

    Article  CAS  Google Scholar 

  29. Uren A, Beilharz T, O’connell M, Bugg S, Driel RV, Vaux D, Lithgow T: Role for yeast inhibitor of apoptosis (IAP)-like proteins in cell division. Proceedings of National Academy of Science 1999, 96: 10170–10175. 10.1073/pnas.96.18.10170

    Article  CAS  Google Scholar 

  30. Amati B, Alevizopoulos K, Vlach J: Myc and the cell cycle. Frontiers in Bioscience 1998, 3: d250–268.

    CAS  PubMed  Google Scholar 

  31. Zou H, Hastie T: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B 2005, 67(2):301–320. 10.1111/j.1467-9868.2005.00503.x

    Article  Google Scholar 

Download references

Acknowledgements

We thank Prof Andre Fujita for providing the reduced HeLa cell cycle dataset. This work was partly supported by a ARC 9/10 grant to J. C. Rajapakse by Ministry of Education, Singapore.

This article has been published as part of BMC Bioinformatics Volume 12 Supplement 13, 2011: Tenth International Conference on Bioinformatics – First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011): Bioinformatics. The full contents of the supplement are available online at http://0-www-biomedcentral-com.brum.beds.ac.uk/1471-2105/12?issue=S13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jagath C Rajapakse.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JR initiated the study, analyzed the results, and wrote the manuscript. PM helped in formulating the experiments, performed all the experiments, and wrote the initial draft. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rajapakse, J.C., Mundra, P.A. Stability of building gene regulatory networks with sparse autoregressive models. BMC Bioinformatics 12 (Suppl 13), S17 (2011). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-12-S13-S17

Download citation

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-12-S13-S17

Keywords