- Open Access
Inferring cellular networks – a review
BMC Bioinformatics volume 8, Article number: S5 (2007)
In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations.
The success of genome sequencing projects has led to the identification of almost all the genes responsible for the biological complexity of several organisms. The next important task is to assign a function to each of these genes. Genes and gene products do not work in isolation; rather, they are connected in highly structured networks of information flow through the cell. The inference of such cellular networks using computational and statistical methods is the topic of this review.
The first two sections describe models based on correlation and statistical dependence using the concept of conditional independence. The objective of this kind of modeling is to explain observed correlations between genes by the presence of other genes. The set of models can be ordered according to how deeply correlations are resolved: correlation graphs are the most simple example, and Bayesian networks the most sophisticated.
The last section discusses computational approaches building on observed effects of external interventions into the cell. In modern biology, perturbation experiments are key to inferring gene function and regulatory pathways. A common biological technique is to perturb a gene of interest and to study which other genes' activities are affected. Either, the effects of interventions can be included into conditional independence models, or the observed cause-effect relations can be used directly. We also discuss the situation when instead of direct observations of intervention effects on pathway components, only secondary effects on downstream genes are available.
Let V be a set of p network components. The biological meaning of a "network component" depends on what kind of data we analyze. We mostly speak of network components as genes, since the primary data for inference is microarray data and the network is a transcriptional gene regulatory network. However, the methods are general and can also be applied to protein data [1–3]. In probabilistic models we treat each component v ∈ V as a random variable X v and the set of all components in the model as a random vector X= (X1,..., X p ). The dataset D consists of N measurements, that is, realizations x1,..., xNof the random vector X. Network components are identified with nodes in a graph. The goal is to find an edge set representing the dependency structure of the network components. We call the graph T = (V, ) the topology of the cellular network. Depending on the model, T can be directed or undirected, cyclic or acyclic. In the important special case, where T is a directed acyclic graph (DAG), we call it G.
Conditional independence models
The first section describes statistical models to resolve the correlation structure of genes. The methods can be distinguished by the way they remove influences of other genes from the observed correlations. We first introduce coexpression networks, then discuss models building on statements of full conditional independence or low-order conditional independence.
Biological processes result from the concerted action of interacting molecules. This general observation suggests a simple idea, which has already motivated the first approaches to clustering expression profiles [4, 5] and is still widely used in functional genomics. It is called the guilt-by-association heuristic: if two genes show similar expression profiles, they are supposed to follow the same regulatory regime. To put it more pointedly: coexpression hints at coregulation. Coexpression networks (also known as relevance networks) are constructed by computing a similarity score for each pair of genes. If similarity is above a certain threshold, the gene pair gets connected in the graph, if not, it remains unconnected. Wolfe et al.  argue that networks of coexpressed genes provide a widely applicable framework for assigning gene function. They show that coexpression agrees well with functional similarity as it is encoded in the Gene Ontology . Examining coexpression is an integral part of most methods combining diverse data sources to uncover biological relationships between genes or proteins [8, 9].
Building coexpression networks
The first critical point in building a coexpression network is how to formalize the notion of similarity of expression profiles. Several measures have been proposed, the most simple of which is correlation. In a Gaussian model, zero correlation corresponds to statistical independence. Correlation networks are easy to interpret and can be accurately estimated even if p ≫ N, that is, the number of genes is much larger than the number of samples. Stuart et al.  have used this approach to build a graph from coexpression across multiple organisms (humans, flies, worms and yeast), finding that many coexpression relationships are conserved over evolution. Correlation networks can be enhanced in several ways. Bickel  generalizes them to time series data by introducing a time-lag for correlation. Kostka and Spang  introduce the concept of differential coexpression, which can be interpreted as the gain or loss of a regulatory mechanism. They formulate a method to find sets of genes, which are highly correlated under one condition (e.g. in healthy cells), but show random behavior under a second condition (e.g. in tumor cells).
Correlation is a linear measure of independence, non-linear dependencies between genes are not necessarily found. This problem can be addressed by using networks built from more flexible similarity measures like pair-wise mutual information , or non-linear kernel-functions . Yamanishi et al.  use kernel functions for supervised network reconstruction by tuning kernel parameters in known parts of a protein-interaction graph and then using them to infer unknown parts. Kato et al.  weight different data sources according to noise and information content and combine them into a single kernel function.
The second critical step in building coexpression networks is assessing the significance of results. Many pairs of genes show similar behavior in expression profiles by chance even though they are not biologically related. A practical, though time-consuming strategy consists of permuting the data matrix and comparing the network obtained from real data with the distribution of similarity scores achieved in the permutations, as Bickel  does to estimate the false discovery rate of spurious connections. In the supervised setting of Yamanishi et al.  cross-validation can be applied to choose optimal parameters.
Problems of coexpression based approaches
Even high similarity of expression tells us little about the underlying biological mechanisms. Coexpression networks include regulatory relationships, but we cannot distinguish direct from indirect dependencies based on the similarity of expression patterns. Fig. 1 exemplifies this problem on a small set of three highly coexpressed genes, which form a clique (a completely connected subgraph) in a coexpression network. The figure shows that several regulatory mechanism can explain this observation, and from coexpression data alone we have no way of choosing between them. There are two possible solutions. Functional genomics has a long tradition of perturbing the natural state of a cell and inferring a gene's function from the observed effects. These interventions allow us to distinguish between the three models in Fig. 1, because each model results in different predictions of effects, which can be compared to those obtained in experiments. For example, perturbing gene Y in the cascade X → Y → Z will only have an effect on gene Z but none on gene X. In the case where Y regulates both X and Z, perturbing it will result in changes at both regulatees. In the last case, where all three genes are regulated by a hidden regulator, perturbing one of them will not lead to changes at the other two. Methods formalizing these considerations are covered in the second part of this review. In the absence of perturbation data statistical methods may be used to find which of the possibilities is most likely. The theoretical background is the concept of conditional independence.
Let X, Y, Z be random variables with joint distribution P. We say that X is conditionally independent of Y given Z (and write X ⊥ Y | Z) if and only if
P(X = x, Y = y | Z = z) = P(X = x | Z = z)·P(Y = y | Z = z).
This is the same as saying
P(X = x | Y = y, Z = z) = P(X = x | Z = z)
and is a direct generalization of the independence condition for X and Y,
P(X = x, Y = y) = P(X = x)·P(Y = y).
The same definitions hold if conditioning is not on a single variable Z but on a set of variables Z. For an interpretation, we can think of random variables as abstract pieces of knowledge obtained from, say, reading books . Then X ⊥ Y | Z means: "Knowing Z, reading Y is irrelevant for reading X"; or in other words: "If I already know Z, then Y offers me no new information to understand X." Variable Z can explain the correlation between X and Y.
The statistical models we discuss in the following all build on the concept of conditional independence. Instead of retrieving sets of coexpressed genes – as coexpression networks do – they try to recover the regulatory relationships between genes. To decide on an edge between X and Y in the graph, statistical models ask questions of the form "Is X independent of Y given Z?", but differ with respect to what Z stands for: either all other variables except for X and Y, or single third variables, or any subset of all the other variables. Coexpression networks can be seen as the special case Z= ∅, which encodes marginal dependencies.
Full conditional models
Full conditional models (also called Markov networks, undirected graphical models, or Markov random fields) ask: "Can the correlation observed between two genes be explained by all other genes in the model?" Nodes i and j are connected by an edge if and only if
where "rest" denotes the set of all variables in V without i and j. Full conditional models become especially simple in a Gaussian setting. Assume that X~N (μ, Σ), where μ, is the mean vector and the covariance matrix Σ is invertible. Let K = Σ-1 be the concentration matrix of the distribution (also called the precision matrix). The value -k ij / is called the partial correlation coefficient between genes i and j . Then it holds for i, j ∈ V with i ≠ j that
X i ⊥ X j | Xrest ⇔ k ij = 0.
This relation is used to define Gaussian graphical models (GGMs) [15, 16]. A GGM is an undirected graph on vertex set V. Each vertex i ∈ V corresponds with a random variable X i ∈ X. The edge set of a GGM is defined by non-zero partial correlations. Vertices i and j are adjacent if and only if k ij ≠ 0. An example is shown in Fig. 2.
To estimate a GGM from data we need to know which elements of the precision matrix K are zero. This can be either done jointly for all edges, or by using non-exhaustive search algorithms like forward and backward search [15–17]. Alternatively, hypothesis testing-based model selection can be pursued by testing each edge separately for inclusion .
Comparison to correlation networks
Correlation graphs visualize the structure encoded in the correlation matrix Σ, which tells us about the similarity of expression profiles. In GGMs, we model using the precision matrix K = Σ-1, which shows correlation after correcting for the influence of all other genes. GGMs not only filter out all high correlations, which can be attributed to other genes, but may also draw attention to genes which are only very weakly correlated with a gene of interest, but highly related in terms of partial correlations in the context of the other neighboring genes in the GGM. These genes can be overlooked in correlation networks [19, 20]. GGMs have a second clear advantage over correlation networks. Whether directly or indirectly, almost all genes will be correlated. Thus, the correlation coefficient is a weak criterion for dependence, but zero correlation is a strong indicator for independence. On the other hand, partial correlation coefficients usually vanish. They provide a strong measure of dependence and, correspondingly, only a weak criterion of independence .
Problems with GGMs
Full conditional relationships can only be accurately estimated if the number of samples N is relatively large compared to the number of variables p. If the number of genes to be analyzed exceeds the number of distinct expression measurements (that is, if p ≫ N), the correlation matrix of expression profiles between genes does not have full rank and cannot be inverted . The p ≫ N-situation is true for almost all genomic applications of graphical models. Therefore, one must either improve the estimators of partial correlations or resort to a simpler model. The basic idea in all of these approaches is that biological data are high-dimensional but sparse in the sense that only a small number of genes will regulate one specific gene of interest.
Several papers suggest ways to estimate GGMs in a p ≫ N-situation. Kishino and Waddell  propose gene selection by setting very low partial correlation coefficents to zero. As they state, the estimate still remains unstable. In one study, Schäfer and Strimmer  use bootstrap resampling together with the Moore-Penrose pseudoinverse and false discovery rate multiple testing, while in another , they discuss a linear shrinkage approach to regularization. Li and Gui  prospose a threshold gradient descent regularization procedure for estimating a sparse precision matrix.
Heuristic regression-based estimation
Full conditional independence models are closely related to a class of graphical models called dependency networks . Dependency networks are built using sparse regression models to regress each gene X i onto the remaining genes XV \ i. The genes, which predict the state of gene X i well, are connected to it by (directed) edges in the graph. In general, dependency networks may be inconsistent, i.e. the local regression models may not consistently specify a joint distribution over all genes. Thus, the resulting model is only an approximation of the true full conditional model. Still, dependency networks are widely used because of their flexibility and the computational advantage compared to structure learning in full conditional independence models. When learning dependency networks, a variety of sparse classification/regression techniques may be used to estimate the local distributions, including linear models with an L1-penalty on model parameters [20, 26], classification trees [25, 27], or sparse Bayesian regression [19, 28]. We will see later that these approaches are very similar to the local regression models used in Bayesian networks. The difference is that in Bayesian networks an additional order of the variables is enforced.
Low-order conditional independence
Full conditional models are hard to estimate if the number of samples is small compared to the number of genes. While the last section described different statistical techniques for p ≫ N-situations, this section introduces the complementary approach. Instead of enhancing the estimation procedure, one can ask a simpler question: "Can the correlation between two genes be explained by a single third gene?" In contrast to GGMs, low-order conditional independence models condition not on the rest of the genes, but only on single third genes. An edge between vertices i and j(i ≠ j) is drawn if the correlation coefficient ρ ij ≠ 0 and no third gene can explain the correlation:
This general idea can be implemented in different ways: In a Gaussian setting, first order conditional independence models were proposed by several authors [29–32]. Testing for first order conditional independence involves only triples of genes at a time; thus, the problem for GGMs in high dimensions no longer exists. Wille et al.  use sparse Gaussian graphical modelling to identify modules of closely related genes and candidate genes for cross-talk between pathways in the Isoprenoid gene network in Arabidopsis thaliana.
In another approach, Margolin et al.  use conditional mutual information to test for first-order independence. The resulting method is called ARACNe and has been applied to expression profiles of human B cells . The advantage of this approach is that the Gaussian assumption is dropped. However, this increased flexibility comes at a prize: ARACNe involves a number of computational approximations and Monte Carlo simulations, which could make the method unstable.
In the last sections we have seen methods to build graphs from marginal dependencies (X i X j ), full conditional dependencies (X i X j | Xrest), or first order dependencies (X i X j | X k for all k ∈ rest). The logical next step is to ask for independencies of all orders. In the resulting graph, two vertices i and j are connected if no subset of the other variables can explain the correlation, that is, if
This includes testing marginal, first order and full conditional independencies. Thus, the number of edges will be smaller compared to the models in the previous sections. The graph encoding the above independence statements for all pairs of nodes is still undirected. It can be shown that knowing independences of all orders gives an even higher resolved representation of the correlation structure. The collection of independence statements already implies directions of some of the edges in the graph [35–37]. The resulting directed probabilistic model is called a Bayesian network.
Definition of a Bayesian network
A (static) Bayesian network is a graphical representation of the dependency structure between the components of a random vector X. The individual random variables are associated with the vertices of a directed acyclic graph (DAG) G, which describes the dependency structure. Each node is described by a local probability distribution (LPD) and the joint distribution p(x) over all nodes factors as
where θ v denotes the parametrization of the local distribution and xpa(v)is the vector of parent states denoting the activity levels of a gene's regulators. The DAG structure implies an ordering of the variables. The parents of each node are those variables that render it independent of all other predecessors. The factorization of the joint distribution is the key property of Bayesian networks. It allows to segment the set of variables into families, which can be treated individually. This basic definition of Bayesian networks poses a number of further questions, which are addressed in the following: (1.) How do the local probability distributions p(x v | xpa(v), θ v ) look like? (2.) How is conditional independence defined for DAGs? (3.) How can we learn a Bayesian network structure from data? (4.) Are there natural limits to structure learning?
Local probability distributions (LPDs)
Bayesian network models differ with respect to assumptions about the local probability distributions p(x v | xpa(v), θ v ) attached to each node v ∈ V. There are two types of parametric LPDs used in practice: multinomial distributions for discrete nodes and Gaussian distributions (normal distributions) for continuous nodes. A discrete node with discrete parents follows a multinomial distribution parametrized by a set of probability vectors, one for each parent configuration. A continuous node with continuous parents follows a Gaussian distribution, where the mean is a linear combination of parent states. Conditional Gaussian (CG) networks are a combination of discrete and Gaussian networks. Continuous nodes follow a Gaussian distribution and are allowed discrete and continuous parents, while discrete nodes follow a multinomial distribution and are restricted to discrete parents. CG networks constitute the general class of graphical models studied in statistics .
Another kind of parametric LPDs are regression trees, as used by Segal et al. [38, 39]. Each regression tree is a rooted binary tree with parents in the DAG as internal nodes and leaf nodes associated with univariate Gaussian distributions. The regression trees capture the local structure in the data, whereas the DAG describes the global structure [40, 41].
Instead of the parametric approaches discussed so far, the relationship between parents and children in the DAG can also be modeled by non-parametric regression models [42–45]. The result is a non-linear continuous model. This is an advantage over multinomial or Gaussian Bayesian networks, which are either discrete or linear.
Bulashevska and Eils  constrain LPDs to noisy logic functions modelling activatory or inhibitory parent-child relations. This has the advantage of simplifying and regularizing the model, while at the same time making it easier to interpret.
Nachman et al.  use non-linear Michaelis-Mentens dynamics to model how the transcription rate of a gene depends on its regulators. This approach combines Bayesian networks with a biochemically realistic quantitative model of gene regulation.
Conditional independence in directed graphs
In Fig. 2 we saw how to read off independence statements from a full conditional independence graph. How does this work in the case of Bayesian networks? The answer is given by the definition of d-separation  ("d" for directed), also called the directed Global Markov condition . The three archetypical situations of d-separation (chain, fork, and collider) can be seen in Fig. 3. In a chain X → Y → Z, the middle node Y blocks the information flow between X and Z and thus it holds that X ⊥ Z | Y. In a fork, where X and Z are both regulated by Y, knowing the state of the regulator renders the regulatees conditionally independent and thus again X ⊥ Z | Y. The last case is more surprising: If X and Z are independent regulators with a common target Y, then the state of Y gives us information about X and Z. For example, imagine that Y is only expressed if only one of its regulators is active, then seeing Y expressed and X active implies Z being inactive. Thus, in the collider X → Y ← Z the middle node Y "unblocks" the path between X and Z and thus X Z | Y.
Many Bayesian networks may represent the same statements of conditional independence. They are statistically undistinguishable and we call them Markov equivalent. All equivalent networks share the same underlying undirected graph (called the skeleton) but may differ in the direction of edges that are not part of a collider (also called a v-structure) . Markov equivalence poses a theoretical limit on structure learning from data: even with infinitely many samples, we cannot resolve the structures in an equivalence class. In biological terms this means: even if we find two genes to be related it may not be clear which one is the regulator and which one is the regulatee. Without perturbation experiments this situation can not be further resolved.
Acyclicity in a cyclic world
Bayesian networks allow the highest resolution of correlation structure. Still, they suffer from a severe shortcoming: they are acyclic. With cycles, we cannot decompose the joint distribution as in the definition of Bayesian networks. Biological networks are all known to contain feedback loops and cycles . Modeling the cell cycle with an acyclic model  can only be a preliminary step. One extension of Bayesian networks that encompasses cyclic structures is the factor graph network model of Gat-Viks et. al. . A second way to address the cycle problem is by assuming that the system evolves over time. This is shown in Fig. 4. We no longer model a static random vector X but a time series X,..., X[T] of observing X at T timepoints. If we assume that X v at time t + 1 can only have parents at time t, then cycles "unroll" and the resulting model is again acyclic and tractable: it is called a Dynamic Bayesian network (DBN) [52, 53]. DBNs have found many applications in network reconstruction [54–56]. They are often augmented with hidden nodes , which can describe transcription factor activity  or any other kind of environmental or non-transcriptional effects in the cell [58–61]. In summary, DBNs provide one of the most flexible frameworks for modeling cellular networks.
Score based structure learning
In correlation networks, GGMs and sparse GGMs we use statistical tests for each gene pair to decide whether the data support an edge or not. The number of tests to be done in these models is limited, even though it can be big in the case of sparse GGMs. For Bayesian networks we need to test independence of a gene pair for every subset of the other genes. This is called constraint-based learning of Bayesian networks [36, 37]. For problems with more than a handful of variables testing becomes infeasible very quickly. In applications in computational biology the network structure is therefore mostly estimated by score based techniques. In the following we review maximum likelihood scores and Bayesian scores to evaluate model fit to data. Once the score is defined, model selection is posed as an optimization problem over the discrete space of possible model structures. Additional topics are how to fight overfitting and how to encode prior information.
A straight-forward idea for model selection is to choose the DAG G, which allows the best fit to data D. The best fit for a given DAG G is determined by maximizing the likelihood p(D|G, θ) as a function of θ, the parameters of the local probability distributions. A score for DAG G is then given by
Unfortunately, the likelihood is not an appropriate score to decide between models since it tends to overfitting. Richer models with more edges will have a better likelihood than simpler ones, since the additional number of parameters allows a better fit to the data. A standard solution to this problem is to penalize the maximum likelihood score according to model complexity. An often used example of this general strategy is scoring with the Bayesian information criterion.
Bayesian information criterion (BIC)
Contrary to what the name suggests, the BIC score  is not a Bayesian score. It is a regularized maximum likelihood estimate, which controls overfitting by penalizing the maximal likelihood of the model with respect to the number of model parameters. It is defined as
where d is the number of parameters and the factor log N scales the penalty with respect to the likelihood. The BIC score can also be used to learn Bayesian networks with missing values or hidden variables. The likelihood has then to be maximized via the Expectation-Maximization (EM) algorithm. In such a scenario, the BIC score is used by Nachman et al.  to learn kinetic models of transcription factors and their targets. They treat protein activities and kinetic constants as hidden variables.
In most cases a full Bayesian approach is preferred over ML or BIC. In Bayesian structure learning we evaluate the posterior probability of model topology G given data D:
The denominator p(D) is an average of data likelihoods over all possible models. This normalizing constant is the same for all models, and thus we do not need compute it to decide between competing models. The two main terms to consider in the Bayesian score are the prior over model structures, p(G), and the marginal likelihood p(D|G).
Marginal likelihood of network structure
The marginal likelihood p(D|G) is the key component of Bayesian scoring metrics. It equals the full model likelihood averaged over parameters of local probability distributions, that is,
Marginalization is the reason why the LPD parameters θ do not enter the definition of the posterior above. They are treated as nuisance parameters and have been integrated out. It is important to note that the LPD parameters were not maximized as it would be done in a maximum likelihood estimate or in a BIC score. Averaging instead of maximizing prevents the Bayesian score from overfitting.
Computation of marginal likelihood depends on the choice of local probability distributions and local priors in the Bayesian network model. To compute the marginal likelihood analytically, the prior p(θ|G) must fit to the likelihood p(D|G, θ). Statistically, this fit is called "conjugacy". A prior distribution is called conjugate to a likelihood, if the posterior is of the same distributional form as the prior . If no conjugate prior is available, the marginal likelihood has to be approximated.
The marginal likelihood for discrete Bayesian networks was first computed by Cooper and Herskovits  and is further discussed by Heckerman et al. . The conjugate prior for the multinomial distribution is the Dirichlet prior . Assuming independence of the prior for each node and each parent configuration, the score decomposes into independent contributions for each family of nodes. Corresponding results exist for Gaussion networks using a Normal-Wishart prior . The marginal likelihood again decomposes into node-wise contributions. Conditional Gaussian networks are a mix of discrete and Gaussian nodes . The marginal likelihood decomposes into a Gaussian part and a discrete part.
For tree LPDs, the marginal likelihood at each node of the DAG further splits into independent components for each leaf of the local regression tree. Conjugate analysis and analytic results are possible using normal-gamma priors for each leaf node [40, 41].
For non-parametrix LPDs, Boolean logic LPDs, and kinetic modelling LPDs, conjugate analysis and analytic computation of the marginal likelihood are not possible. Imoto et al.  use a Laplace approximation to approach the true marginal likelihood. Bulashevska and Eils  use Gibbs sampling to estimate the model posterior p(G|D) and the parameter posterior p(θ|D). Nachman et al.  use the BIC score for model selection.
Structure priors p(G) help to focus inference on reasonable models by including biological prior knowledge or integrating different data sources. Very often the biological prior knowledge can be encoded in a prior network, which is then to be refined by statistical structure learning. The first idea is to restrict the search space to a – conveniently defined – vicinity () of the prior network . All the DAGs in the restricted search space are considered equally likely. This can be interpreted as a rigid structure prior of the form
A smoother way to guarantee that DAGs similar to the prior network get higher prior probability is the following. We measure the confidence of edge (v, w) by a value 0 <κ vw ≤ 1. A structure prior can then be defined proportional to a product of weights κ vw over all edges (v, w):
The normalization constant, which would be necessary to make the right-hand side a density, is the same for all models and can be ignored when computing relative posterior probabilities. What are smart choices of κ vw ? There are several approaches suggested in the literature. Heckerman et al.  assume constant penalty κ vw ≡ κ for all edges in which G and differ. Thus, p(D) ∝ κεwhere ε is the number of edges in which G differs from the prior DAG . Another approach by Imoto et al.  and Tamada et. al.  uses a network prior in an iterative scheme. They construct a Bayesian network from microarray data, propose putative transcription factors from the network structure, and search for common motifs in the DNA sequences of children and grand-children of transcription factors. They then re-learn the network by penalizing edges without motif evidence harder than edges with motif evidence. Bernard et al.  define weights from p-values of binding location data. They assume that p-values follow an exponential distribution if the edge is present and a uniform distribution if it is not. By Bayes' rule they derive probabilities for an edge to be present given the p-values from the location data. The free parameter of the exponential distribution is then integrated out and the final probabilities vw are used as weights in a structure prior. Fig. 5 shows a comparison of these three prior definitions. They can be organized by the weights κ vw they give for the presence or absence of an edge given prior information.
In most applications the Bayesian score for discrete data is used. When learning gene regulatory networks from microarray data, it is necessary to preprocess the continuous gene expression values and discretize them. In general, discretization may be carried out for computational efficiency, or because background knowledge suggests that the underlying variables are indeed discrete. Discretizing continuous variables results in a loss of information, although it can also reduce noise, since discretized data can be more stable with respect to random variations of the mRNA measurements. Several methods to discretize microarray data have been proposed in the literature: Friedman et al.  discretize expression values into three categories, depending on whether the expression rate is significantly lower than, similar to, or greater than control, respectively. Pe'er et al.  introduce an adaptive discretization procedure. They model the expression level of a gene in different experiments as samples from a mixture of normal distributions, where each normal component corresponds to a specific state. They then use standard k-means clustering for inference. Hartemink et al.  use a discretization coalescence method, which incrementally reduces the number of discretization levels for each gene while preserving as much total mutual information between genes as possible. In the previous three approaches, expression levels were discretized before and independently of structure learning. Suboptimal discretization algorithms can lead to degraded network structure. To avoid this, Steck and Jaakkola  derive a scoring function to efficiently jointly optimize the discretization policy and the structure of the graphical model.
Regularization is a technique used in machine learning to ensure uniqueness of solution and to fight overfitting by constraining admissible models [14, 71]. Regularization is always needed in p ≫ N-situations. We already saw examples of regularization when Gaussian graphical models were adapted to the p ≫ N-situation. Different methods were proposed for Bayesian networks. Steck and Jaakkola  show that scaling down the parameters of the Dirichlet prior used for the computation of the marginal likelihood leads to a strong regularization of the model structure and a sparse graph. Another way to regularize Bayesian networks is to constrain the local probability distributions. Bulashevska and Eils  suggest learning noisy logic gates instead of unconstrained multinomial LPDs. The drawback is that Bayesian conjugate analysis, which leads to the analytic solution of the marginal likelihood, is no longer possible and Gibbs sampling has to be applied. Module networks [38, 39] constrain the number of parameters in the model by assuming that groups of genes (so called modules) share the same dependence on regulators. Learning module networks involves an iteration of assigning genes to modules and searching for dependencies between modules.
Model selection and assessment
To search for the DAG with highest score is mathematically trivial: compute the score for every possible DAG and choose the one that achieves the highest value. What makes exhaustive search computationally infeasible in almost all applications is the huge number of DAGs. The number of DAGs on n edges is
with a0 = 1 . The number of DAGs increases explosively, as the first few steps in the recursion show: 1, 1, 3, 25, 543, 29 281, 3 781 503, 1 138 779 265. That means, we must use heuristic strategies to find high-scoring Bayesian networks without enumerating all possible DAGs.
Defining search space
First we need to decide how to describe models of interest. This defines the model space, in which we search for models describing the data well. To apply search heuristics we have to equip the search space with a neighborhood relation, that is, operators to move from one point of the search space to the next one. The most simple search space results from defining a neighborhood relation on DAGs. Two DAGs are neighbors if they differ by one edge, which is either missing in one of them or directed the other way round. Madigan et al.  and Chickering  restrict the search space to Markov equivalence classes of DAGs which uniquely describe a joint distribution. Thus, no time is lost in evaluating DAG models which are equivalent anyway. Friedman and Koller  search over orders of nodes rather than over network structures. They argue that the space of orders is smaller and more regular than the space of structures and has a much smoother posterior landscape.
Most of the following search algorithms can be applied to all search spaces, even though they are usually applied to DAGs. They return a single best network. A simple and fast but still powerful method is hill-climbing by greedy search. First, choose a point in search space to start from, e.g. a random graph or the empty graph. Compute the posterior probability for all graphs in the neighborhood of the current graph and select the graph with highest score. Iterate until no graph in the neighborhood has a larger score than the current graph. This procedure finds local maxima of the Bayesian scoring metric. The K2-algorithm  is a variant of greedy search, which assumes that the order of nodes is known. Several approaches have been suggested to speed up model search. The sparse candidate algorithm  restricts the number of possible parents for each node by searching for pairs of nodes which are highly dependent. The ideal parent algorithm [47, 78] constructs a parent profile perfectly explaining the child behaviour and uses it to guide parent selection and to restrict the search space. Peña et al.  grow Bayesian networks starting from a target gene of interest. They iteratively add to the Bayesian network parents and children of all the genes already included in it. The algorithm stops after a predefined number of steps and thus, intuitively, highlights the surrounding area of the seed gene without having to compute the complete Bayesian network over all genes. Friedman [80, 81] introduces the structural EM algorithm to learn Bayesian networks in the presence of missing values or hidden variables. It is an extension of the Expectation-Maximization (EM) algorithm that performs structure search inside the EM procedure and shows improvements in terms of speed and accuracy.
The problem with optimal models is, as Edwards  puts it: "Any method (or statistician) that takes a complex multivariate dataset and, from it, claims to identify one true model, is both naive and misleading". The emphasis is on "one true model". A better strategy than choosing a single best model is exploring the whole posterior distribution. Unfortunately, direct sampling from the posterior is intractable. The most we know about the data distribution is the empirical distribution of observations in the dataset. A classical approach to assess variability in the data is bootstrapping . The strategy is to sample with replacement from the observations in the data set to get a number of bootstrap datasets, and then learn a network on every bootstrap dataset. The relative frequency of network features in the resulting network structures can be used as a measure of reliability [50, 68]. Bootstrap samples can contain multiple copies of identical data points. This implies strong statistical dependencies between variables when given a small dataset. As a consequence, the resulting network structure can be considerably biased towards denser graphs. Steck and Jaakkola  propose a correction for this bias. As another simple way to avoid the bootstrap-bias Steck and Jaakkola  use the leave-k-out method. Instead of resampling with replacement, k cases are left out of the dataset when estimating a model. Repeating this many times also gives an estimate of model variability.
Markov Chain Monte Carlo (MCMC) is a simulation technique which can be also used to sample from the posterior p(G| D). Given a network structure, a new neighboring structure is proposed. This new structure is accepted with the Metropolis Hastings acceptance criterion . The iteration of this procedure produces a Markov chain that under fairly general conditions converges in distribution to the true posterior. MCMC is used by Husmeier  to learn dynamic Bayesian networks. Madigan et al.  use MCMC over Markov equivalence classes and Friedman and Koller  over orders of nodes.
Graphical models visualize a multivariate dependency structure. They can only answer biological questions if they succeed in reliably and accurately reconstructing biologically relevant features of cellular networks. Unfortunately, rigorous assessment and benchmarking of methods are still rare. One of the first evaluation studies is by Smith et al. . They sample data from a songbird's brain model and report excellent recovery success when learning a Bayesian network from it. Zak et al.  develop a realistic 10 gene network, where the biological processes at the different levels of transcription, translation and post-translational modifications were modeled with systems of differential equations. They show that linear and log-linear methods fail to recover the network structure. Husmeier  uses the same simulation network  to specify sensitivity and specificity of dynamic Bayesian networks. He demonstrates how the network inference performance varies with the training set size, the degree of inadequacy of prior assumptions, and the experimental sampling strategy. By analyzing ROC curves Husmeier can show fair performance of DBNs. Wimberly et al.  test 10 algorithms, including Boolean and Bayesian networks, on a simulation  of the genetic network of the sea urchin embryo . They report that reconstruction is unreliable with all methods and that the performance of the better algorithms quickly degrades as simulations become more realistic. Basso et al.  show that their own method, ARACNe, compares favorably against static Bayesian networks on a simulated network with 19 nodes  – but only if the dataset includes several hundreds of observations. On the other hand, Hartemink  finds dynamic Bayesian networks to be even more accurate than ARACNe on datasets of the same size. In a recent comparative evaluation, Werhli et al.  find no significant differences between coexpression networks, Gaussian graphical models and Bayesian networks, if they are used on nonlinear simulated data and real data. They conclude that the higher computational cost of inferring Bayesian networks over GGMs and coexpression networks is often not justified.
All in all the results show severe limitations of graphical models for microarray data: They need a large sample size and capture only parts of biologically relevant networks. One reason for this shortcoming is that the models we discussed so far all use purely observational data, where the cellular network was not perturbed experimentally. In simulations [92–94] and on real data [3, 92] it was shown that data from perturbation experiments greatly improve performance in network reconstruction. In Bayesian networks this improvement is especially pronounced . Accordingly, the following section introduces methodology for learning from effects of interventions in a probabilistic framework.
Learning from experimental interventions
Physicist Richard Feynman once said: "What I cannot create, I do not understand". This quote stresses the importance of action for understanding. A complex system is not understood solely by passive contemplation, it needs active manipulation by the researcher. In biology this fact is long known. Functional genomics has a long tradition of inferring the inner working of a cell by breaking it. "What I cannot break, I do not understand" is the credo of functional genomics research.
Linking causes with effects
Rung et al.  build a directed disruption graph by drawing an edge (i,j) if perturbing gene i results in a significant expression change at gene j. The authors focus on features of the disruption network that are robust over a range of significance cutoffs. Disruption networks do not distinguish between direct and indirect effects (and are in this sence similar to co-expression networks). Fig. 6 shows the difference between a causal network and a disruption network.
Distinguishing direct from indirect effects
Disruption networks can be used as a starting point for further analysis. Wagner [96–98] uses graph-theoretic methods of transitive reduction [99, 100] to find parsimonious subgraphs explaining all the effects in the disruption network. These methods are deterministic and do not take measurement noise into account. Wang and Cooper  describe a Bayesian generalization of the Wagner algorithm  yielding a distribution over possible causal relationships between genes.
A simple deterministic model of regulatory networks are Boolean networks: they are defined by a directed (and possibly cyclic) graph. For each node exists a boolean function relating parent states to the child state. Perturbations allow us to infer the structure and the logic of Boolean networks [102–104].
Unfortunately, deterministic models can not compensate for the noise inherent in biological systems. This is the reason why also for perturbation data probabilistic models are preferred.
Rice et al.  build correlation graphs using knockout data. They assume that the data contain measurements of the unperturbed cell and several replicates of measurements for every gene knockout. For each gene i, they concatenate the wild-type data with the intervention data of this gene and compute on the joint data the correlation of gene i to all other genes. In the final graph, there is an arrow (i, j) if gene j was highly correlated to gene i. Since the correlation was computed on knockout data, the graph encodes causation and not only correlation. The big disadvantage of this method is the need for many (> 10) replicates of knockout experiments for every gene in the model, which is unrealistic for most real-world applications. Several regression methods make more efficient use of data.
Rogers and Girolami  use sparse Bayesian regression based on a Gaussian linear model to estimate a dependency network from knock-out data. They regress each gene onto all other genes by combining all the data corresponding to knockouts of genes other than the particular gene of interest. The measurements of the knockout gene are ignored when predicting this gene's expression from the other genes. We will see below that this strategy is the same as Pearl's ideal interventions used in Bayesian networks . A prior on model parameters constrains most regression coefficients to zero and enforces a sparse solution. Non-zero regression coefficients are indicated by arrows in the regulation network. Other regression methods for network reconstruction are derived from a branch of engineering called system identification . Functional relations between network components are inferred from measurements of system dynamics. Several papers [107–110] use multiple regression to model the response of genes and proteins to external perturbations.
Pearl  proposes an idealized model of interventions in Bayesian networks. He assumes that an external manipulation completely controls the target node v. The influence of parent nodes pa(v) is cut and the LPD of X v degenerates to a point mass at the target state x' v , that is,
Ideal interventions are easily introduced into the computation of marginal likelihood. The key observation is that fixing a variable to a state tells us nothing about its "natural" behaviour. Cooper and Yoo  show that only cases in which a node was not fixed by an external manipulation enter into this node's contribution to the marginal likelihood. Ideal interventions were applied in Bayesian networks [3, 68, 94], factor graphs  and dependency networks . In simulations [92–94] and on real data [3, 92] it was found that interventions are critical for effective inference, particularly to establish directionality of the connections.
Pearl's model of ideal interventions contains a number of idealizations. The most important of these are that manipulations only affect single genes and that results can be controlled deterministically. The first assumption may not be true if there are compensatory effects involving other genes. The second assumption is also very limiting in realistic biological scenarios. Often the experimentalist lacks knowledge about the exact size of perturbation effects. To cope with this uncertainty, Markowetz et al.  introduced soft interventions as a generalization of ideal interventions. Variables are "pushed" in the direction of target states without fixing them. This idea is formalized in a Bayesian framework based on Conditional Gaussian networks.
Physical network models
Yeang et al.  find the most likely annotated molecular interaction graph given a variety of data sources including gene-knockouts. Knock-out data is functional in nature and provides only indirect evidence about network structure. Yeang et al. associate each observed knock-out effect in the deletion mutant data with molecular cascades that could in principle explain the effect.
Genetic interactions are defined by comparing the phenotypes of two single gene perturbations with the phenotype of the double gene perturbation. One example for a genetic interaction is epistasis ; it is defined as one gene masking the effect of another gene. Driessche et al.  use expression profiles as phenotypes and partly reconstruct a developmental pathway in D. discoideum. Another example of a genetic interaction is synthetic lethality, where two genes with a viable phenotype show a lethal phenotype in a double perturbation . Wong et al.  propose a classification method to predict synthetic lethal interactions. Epistasis and synthetic lethality are just two examples of a broad range of possible genetic interactions. Drees et al.  define nine modes of genetic interactions for quantitative phenotypes that can be described by inequality constraints between the phenotypic values. They show that all modes of genetic interactions can be identified in agar-invasion phenotypes of mutant yeast.
Nested effects models
A key obstacle to inferring genetic networks from perturbation screens is that phenotypic profiles generally offer only indirect information on how genes interact. Cell morphology or sensitivity to stresses are global features of the cell, which are hard to relate directly to the genes contributing to them. Gene expression phenotypes also offer only an indirect view of pathway structure due to the high number of non-transcriptional regulatory events like protein modifications. For example, when silencing a kinase we might not be able to observe changes in the activation states of other proteins involved in the pathway; the only information we may get is that genes downstream of the pathway show expression changes. Thus, phenotypic profiles may provide only indirect information about information flow and pathway structure.
A recent approach  especially designed to learning from indirect information and high-dimensional phenotypes are Nested Effects Models that reconstruct features of the internal organization of the cell from the nested structure of observed perturbation effects. Perturbing some genes may have an influence on a global process, while perturbing others affects sub-processes of it. Imagine, for example, a signaling pathway activating several transcription factors. Blocking the entire pathway will affect all targets of the transcription factors, while perturbing a single downstream transcription factor will only affect its direct targets, which are a subset of the phenotype obtained by blocking the complete pathway. Figure 7 shows a schematic plot of how the position of perturbed genes in a pathway corresponds to a nested structure of observed effects. Markowetz et al.  demonstrate the power of Nested Effects Models in the controlled setting of simulation studies and explain its practical use in the context of an RNAi data set investigating the response to microbial challenge in Drosophila melanogaster.
Statistical analysis of high-throughput data sets aims at generating hypotheses about the functions and regulatory roles of genes and proteins. Small-scale traditional experimental techniques are needed to verify the statistical predictions and inferred pathways. Experimental design or active learning deals with deciding which interventions to perform to discriminate optimally between alternative models. For reconstruction of regulatory networks, a number of methods have been proposed in different frameworks: for Bayesian networks [120, 121], physical network models , Boolean networks , and dynamical modeling .
Summary and Outlook
Fig. 8 organizes the network reconstruction methods we discussed in this review with respect to a few basic questions: Does the data include gene knockout or knockdown experiments? If not, we call it purely observational data; if yes, we call it interventional data. Is the model probabilistic or deterministic? Does the model allow for changes over time? If yes, we call it dynamic, otherwise static. Does the model describe transcriptional regulatory networks? And if so, are additional non-transcriptional effects taken into account? In the leaf nodes of the decision tree modeling techniques fall together that are methodologically similar. Fig. 8 shows representative examples. Some branches in the tree are missing, where we found no research on a given approach.
The need for a holistic view
The internal organization of the cell comprises many layers. The genome refers to the collection of information stored in the DNA, while the transcriptome includes all gene transcripts. On the next level the proteome covers the set of all proteins. The metabolome contains small molecules – sugars, salts, amino acids, and nucleotides – that participate in metabolic reactions required for the maintenance and normal function of a cell. Results of internal reactions are features of the cell like growth or viability, which can be taken as phenotypes to study gene function. To understand the complexity of living cells future research will need to build models including all these layers. Statistical inference on parts of the system will not provide the mechanistic insights functional genomics is seeking for. Recent research concentrates on combining information from genome, transcriptome and proteome, e.g. for reconstructing signaling pathways [113, 124] and networks of functionally related proteins [8, 9, 125]. This is a necessary step in the right direction. However, these models will still be fragmentary if they do not include and predict phenotypical changes of interventions perturbing the normal course of action in the cell. We will only understand what we can break.
Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics. 2004, 20 (Suppl 1): i363-370. 10.1093/bioinformatics/bth910.
Kato T, Tsuda K, Asai K: Selective integration of multiple biological data for supervised network inference. Bioinformatics. 2005, 21 (10): 2488-95. 10.1093/bioinformatics/bti339.
Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP: Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005, 308 (5721): 523-9. 10.1126/science.1105809.
Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95 (25): 14863-8. 10.1073/pnas.95.25.14863.
Spellman P, Sherlock G, Zhang M, Iyer V, Anders K, Eisen M, Brown P, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9 (12): 3273-97.
Wolfe CJ, Kohane IS, Butte AJ: Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks. BMC Bioinformatics. 2005, 6: 227-10.1186/1471-2105-6-227.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-9. 10.1038/75556.
Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG: Discovery of biological networks from diverse functional genomic data. Genome Biology. 2005, 6 (R114):
Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA. 2003, 100 (14): 8348-8353. 10.1073/pnas.0832373100.
Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302 (5643): 249-55. 10.1126/science.1087447.
Bickel DR: Probabilities of spurious connections in gene networks: application to expression time series. Bioinformatics. 2005, 21 (7): 1121-8. 10.1093/bioinformatics/bti140.
Kostka D, Spang R: Finding disease specific alterations in the co-expression of genes. Bioinformatics. 2004, 20 (Suppl 1): I194-I199. 10.1093/bioinformatics/bth909.
Butte A, Kohane I: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput. 2000, 418-29.
Schölkopf B, Smola AJ: Learning with kernels. 2002, Cambridge, MA: The MIT Press
Lauritzen SL: Graphical Models. 1996, Oxford: Clarendon Press
Edwards D: Introduction to Graphical Modelling. 2000, Springer
Smith PWF, Whittaker J: Edge exclusion tests for graphical Gaussian models. Learning in Graphical Models. Edited by: Jordan M. 1999, MIT Press, 555-574.
Drton M, Perlman MD: Model Selection for Gaussian Concentration Graphs. Biometrika. 2004, 91 (3):
Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M: Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis. 2004, 90: 196-212. 10.1016/j.jmva.2004.02.009.
Meinshausen N, Bühlmann P: High dimensional graphs and variable selection with the Lasso. Annals of Statistics. 2005, 34 (3): 1436-1462. 10.1214/009053606000000281.
Schäfer J, Strimmer K: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005, 21 (6): 754-64. 10.1093/bioinformatics/bti062.
Kishino H, Waddell PJ: Correspondence Analysis of Genes and Tissue Types and Finding Genetic Links from Microarray Data. Genome Informatics. Edited by: Dunker A, Konagaya A, Miyano S, TTakagi. 2000, Tokyo: Universal Academy Press
Schäfer J, Strimmer K: A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics. Stat Appl Genet Mol Biol. 2005, 4: Article32-10.2202/1544-6115.1175.
Li H, Gui J: Gradient Directed Regularization for Sparse Gaussian Concentration Graphs, with Applications to Inference of Genetic Networks. Biostatistics. 2006, 7 (2): 302-317. 10.1093/biostatistics/kxj008.
Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C: Dependency Networks for Inference, Collaborative Filtering, and Data Visualization. Journal of Machine Learning Research. 2000, 1 (Oct): 49-75.
Bonneau R, Reiss D, Shannon P, Facciotti M, Hood L, Baliga N, Thorsson V: The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006, 7 (5): R36-10.1186/gb-2006-7-5-r36.
Soinov LA, Krestyaninova MA, Brazma A: Towards reconstruction of gene networks from expression data by supervised learning. Genome Biology. 2003, 4: R6-10.1186/gb-2003-4-1-r6.
Rogers S, Girolami M: A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics. 2005, 21 (14): 3131-7. 10.1093/bioinformatics/bti487.
Wille A, Bühlmann P: Low-Order Conditional Independence Graphs for Inferring Genetic Networks. Stat Appl Genet Mol Biol. 2006, 5: Article1-10.2202/1544-6115.1170.
Wille A, Zimmermann P, Vranová E, Fürholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, Zitzler E, Gruissem W, Bühlmann P: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004, 5 (11): R92-10.1186/gb-2004-5-11-r92.
Magwene PM, Kim J: Estimating genomic coexpression networks using first-order conditional independence. Genome Biol. 2004, 5 (12): R100-10.1186/gb-2004-5-12-r100.
de la Fuente A, Bing N, Hoeschele I, Mendes P: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics. 2004, 20 (18): 3565-3574. 10.1093/bioinformatics/bth445.
Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A: ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics. 2006, 7 (Suppl 1): S7-10.1186/1471-2105-7-S1-S7.
Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37 (4): 382-90. 10.1038/ng1532.
Pearl J: Probabilistic Reasoning in Intelligent Systems: networks of plausible inference. 1988, Morgan Kaufmann
Pearl J: Causality: Models, Reasoning and Inference. 2000, Cambridge: Cambridge University Press
Spirtes P, Glymour C, Scheines R: Causation, Prediction, and Search. 2000, Cambridge MA: MIT Press, 2
Segal E, Pe'er D, Regev A, Koller D, Friedman N: Learning Module Networks. Journal of Machine Learning Research. 2005, 6 (Apr): 557-588.
Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics. 2003, 34 (2): 166-176.
Friedman N, Goldszmidt M: Learning Bayesian Networks with Local Structure. Learning in Graphical Models. Edited by: Jordan MI. 1998, Cambridge, MA: MIT Press, 421-459.
Chickering DM, Heckerman D, Meek C: A Bayesian approach to learning Bayesian networks with local structure. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence. 1997, Providence, RI: Morgan Kaufmann
Imoto S, Goto T, Miyano S: Estimation of genetic networks and functional structures between genes by using Bayesian network and nonparametric regression. Pac Symp Biocomput. 2002, : 175-186.
Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S, Miyano S: Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. Proc 2nd Computational Systems Bioinformatics. 2003, 104-113.
Imoto S, Kim S, Goto T, Miyano S, Aburatani S, Tashiro K, Kuhara S: Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. J Bioinform Comput Biol. 2003, 1 (2): 231-52. 10.1142/S0219720003000071.
Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S, Miyano S: Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics. 2003, 19 (Suppl 2): II227-II236. 10.1093/bioinformatics/btg1082.
Bulashevska S, Eils R: Inferring genetic regulatory logic from expression data. Bioinformatics. 2005, 21 (11): 2706-13. 10.1093/bioinformatics/bti388.
Nachman I, Regev A, Friedman N: Inferring quantitative models of regulatory networks from expression data. Bioinformatics. 2004, 20 (suppl 1): i248-256. 10.1093/bioinformatics/bth941.
Verma TS, Pearl J: Equivalence and Synthesis of Causal Models. Proc Sixth Conf on Uncertainty in Artificial Intelligence. Edited by: Bonissone PB, Henrion M, Kanal LN, Lemmer JF. 1990, Amsterdam: North-Holland, 255-268.
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P: Molecular Biology of the Cell. 2002, New York: Garland Science, 4
Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology. 2000, 7 (3): 601-620. 10.1089/106652700750050961.
Gat-Viks I, Tanay A, Raijman D, Shamir R: A probabilistic methodology for integrating knowledge and experiments on biological networks. J Comput Biol. 2006, 13 (2): 165-181. 10.1089/cmb.2006.13.165.
Friedman N, Murphy K, Russell S: Learning the Structure of Dynamic Probabilistic Networks. Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98). 1998, San Francisco, CA: Morgan Kaufmann Publishers, 139-147.
Murphy K, Mian S: Modelling gene expression data using dynamic Bayesian networks. 1999, Tech. rep., Computer Science Division, University of California, Berkeley, CA
Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics. 2004, 20 (18): 3594-603. 10.1093/bioinformatics/bth448.
Bernard A, Hartemink AJ: Informative structure priors: joint learning of dynamic regulatory networks from multiple types of data. Pac Symp Biocomput. 2005, 459-70.
Zou M, Conzen SD: A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics. 2005, 21: 71-79. 10.1093/bioinformatics/bth463.
Perrin BE, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'Alche Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics. 2003, 19 (Suppl 2): II138-II148. 10.1093/bioinformatics/btg1071.
Beal MJ, Falciani F, Ghahramani Z, Rangel C, Wild DL: A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics. 2005, 21 (3): 349-356. 10.1093/bioinformatics/bti014.
Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F: Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics. 2004, 20 (9): 1361-1372. 10.1093/bioinformatics/bth093.
Rangel C, Wild DL, Falciani F, Ghahramani Z, Gaiba A: Modeling biological responses using gene expression profiling and linear dynamical systems. Proceedings of the 2nd International Conference on Systems Biology. 2001, Madison, WI: Omnipress, 248-256.
Ong IM, Glasner JD, Page D: Modelling regulatory pathways in E. coli from time series expression profiles. Bioinformatics. 2002, 18 (Suppl 1): S241-8.
Schwarz G: Estimating the Dimension of a Model. Annals of Statistics. 1978, 6 (2): 461-464.
Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 1996, Chapman and Hall-CRC
Cooper GF, Herskovits E: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning. 1992, 9: 309-347.
Heckerman D, Geiger D, Chickering DM: Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning. 1995, 20 (3): 197-243.
Geiger D, Heckerman D: Learning Gaussian Networks. Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence. Edited by: de Mántaras RL, Poole D, Seattle. 1994, Washington, USA: Morgan Kaufmann, 235-243.
Bøttcher SG: Learning Bayesian Networks with Mixed Variables. PhD thesis. 2004, Aalborg University, Denmark
Pe'er D, Regev A, Elidan G, Friedman N: Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001, 17 (Suppl 1): S215-S224.
Hartemink AJ, Gilford DK, Jaakkola TS, Young RA: Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models. Pac Symp Biocomput. 2002, : 437-449.
Steck H, Jaakkola T: (Semi-)Predictive Discretization during Model Selection. 2003, Tech. Rep. AI Memo AIM-2003-002, MIT
Markowetz F, Spang R: Molecular Diagnosis: classification, model selection, and performance evaluation. Methods of Information in Medicine. 2005, 44 (3): 438-43.
Steck H, Jaakkola T: On the Dirichlet Prior and Bayesian Regularization. Advances in Neural Information Processing Systems 15. 2002, Cambridge, MA: MIT Press
Robinson RW: Counting labeled acyclic digraphs. New Directions in the Theory of Graphs. Edited by: Harary F. 1973, New York: Academic Press, 239-273.
Madigan D, Andersson S, Perlman M, Volinsky C: Bayesian model averaging and model selection for Markov equivalence classes of acyclic graphs. Communications in Statistics: Theory and Methods. 1996, 25: 2493-2519. 10.1080/03610929608831853.
Chickering DM: Learning Equivalence Classes of Bayesian Network Structures. Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, Morgan Kaufmann. 1996, 150-157.
Friedman N, Koller D: Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks. Machine Learning. 2003, 50: 95-126. 10.1023/A:1020249912095.
Friedman N, Nachman I, Peer D: Learning Bayesian Network Structures from Massive Datasets: The Sparse Candidate Algorithm. Proc of Uncertainty in Artificial Intelligence. 1999
Nachman I, Elidan G, Friedman N: "Ideal Parent" structure learning for continuous variable networks. Proceedings of the 20th conference on Uncertainty in artificial intelligence. 2004, Arlington, Virginia, United States: AUAI Press, 400-409.
Peña J, Björkegren J, Tegnér J: Growing Bayesian network models of gene networks from seed genes. Bioinformatics. 2005, 21 (Suppl 2): ii224-ii229. 10.1093/bioinformatics/bti1137.
Friedman N: Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Proc of the Fourteenth Inter Conf on Machine Learning (ICML97). Edited by: Fisher D. 1997, San Francisco, CA: Morgan Kaufmann, 125-133.
Friedman N: The Bayesian Structural EM Algorithm. Proc of the Fourteenth Conf on Uncertainty in Artificial Intelligence (UAI'98). Edited by: Cooper GF, Moral S. 1998, San Francisco, CA: Morgan Kaufmann, 129-138.
Efron B, Tibshirani RJ: An introduction to the boostrap. 1993, Chapman and Hall
Steck H, Jaakkola TS: Bias-Corrected Bootstrap and Model Uncertainty. Advances in Neural Information Processing Systems 16. Edited by: Thrun S, Saul L, Schölkopf B. 2004, Cambridge, MA: MIT Press
Hastings W: Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97.
Husmeier D: Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics. 2003, 19 (17): 2271-2282. 10.1093/bioinformatics/btg313.
Smith VA, Jarvis ED, Hartemink AJ: Evaluating functional network inference using simulations of complex biological. Bioinformatics. 2002, 18 (Suppl 1): S216-S224.
Zak DE, Doyle FJ, Gonye GE, Schwaber JS: Simulation studies for the identification of genetic networks from cDNA array and regulatory activity data. Proceedings of the Second International Conference on Systems Biology. 2001, 231-238.
Wimberly FC, Heiman T, Ramsey J, Glymour C: Experiments on the Accuracy of Algorithms for Inferring the Structure of Genetic Regulatory Networks from Microarray Expression Levels. In. Proc IJCAI 2003 Bioinformatics Workshop. 2003
Brown C, Rust A, Clarke P, Pan Z, Schilstra M, Buysscher T, Griffin G, Wold B, Cameron R, Davidson E, Bolouri H: New computational approaches for analysis of cis-regulatory networks. Developmental Biology. 2002, 246: 86-102. 10.1006/dbio.2002.0619.
Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, Arenas-Mena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, jun Pan Z, Schilstra MJ, Clarke PJC, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H: A Genomic Regulatory Network for Development. Science. 2002, 295 (5560): 1669-1678. 10.1126/science.1069883.
Hartemink AJ: Reverse engineering gene regulatory networks. Nat Biotechnol. 2005, 23 (5): 554-5. 10.1038/nbt0505-554.
Werhli AV, Grzegorczyk M, Husmeier D: Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics. 2006, 22: 2523-2531. 10.1093/bioinformatics/btl391.
Zak DE, Gonye GE, Schwaber JS, Doyle FJ: Importance of Input Perturbations and Stochastic Gene Expression in the Reverse Engineering of Genetic Regulatory Networks: Insights From an Identiflability Analysis of an In Silico Network. Genome Res. 2003, 13 (11): 2396-2405. 10.1101/gr.1198103.
Markowetz F, Spang R: Evaluating the Effect of Perturbations in Reconstructing Network Topologies. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). Edited by: Hornik K, Leisch F, Zeileis A. 2003
Rung J, Schlitt T, Brazma A, Freivalds K, Vilo J: Building and analysing genome-wide gene disruption networks. Bioinformatics. 2002, 18 (Suppl 2): S202-S210.
Wagner A: Reconstructing Pathways in Large Genetic Networks from Genetic Perturbations. Journal of Computational Biology. 2004, 11: 53-60. 10.1089/106652704773416885.
Wagner A: Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data. Genome Res. 2002, 12 (2): 309-315. 10.1101/gr.193902.
Wagner A: How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps. Bioinformatics. 2002, 17 (12): 1183-1197. 10.1093/bioinformatics/17.12.1183.
Aho AV, Garey M, Ullman JD: The transitive reduction of a directed graph. SIAM J Comput. 1972, 1 (2): 131-137. 10.1137/0201008.
van Leeuwen J: Graph algorithms. Handbook of Theoretical Computer Science. Edited by: van Leeuwen J, Elsevier. 1990, 525-632.
Wang W, Cooper GF: An Bayesian Method for Biological Pathway Discovery from High-Throughput Experimental Data. Proc 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference (CSB 2004). 2004, Stanford, CA, USA: IEEE Computer Society, 645-646.
Ideker T, Thorsson V, Karp RM: Discovery of regulatory interactions through perturbation: inference and experimental design. Pac Symp Biocomput. 2000, : 302-313.
Akutsu T, Kuhara S, Maruyama O, Miyano S: Identification of Gene Regulatory Networks by Strategic Gene Disruptions and Gene Overexpressions. Proc 9th Annual ACM-SIAM Symposium on Discrete Algorithms. 1998, 695-702.
Akutsu T, Kuhara S, Maruyama O, Miyano S: A System for Identifying Genetic Networks from Gene Expression Patterns Produced by Gene Disruptions and Overexpressions. Genome Informatics 9. Edited by: Miyano S, Takagi T. 1998, Tokyo: Universal Academy Press, 151-160.
Rice JJ, Tu Y, Stolovitzky G: Reconstructing biological networks using conditional correlation analysis. Bioinformatics. 2005, 21: 765-773. 10.1093/bioinformatics/bti064.
Ljung L: System Identification – Theory for the User. 1999, Prentice Hall, 2
Yeung S, Tegnér J, Collins JJ: Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA. 2002, 99 (9): 6163-8. 10.1073/pnas.092576199.
Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science. 2003, 301 (5629): 102-105. 10.1126/science.1081900.
di Bernardo D, S GT, Collins JJ: Robust identification of large genetic networks. Pac Symp Biocomput. 2004, 486-97.
di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat Biotechnol. 2005, 23 (3): 377-83. 10.1038/nbt1075.
Cooper GF, Yoo C: Causal Discovery from a Mixture of Experimental and Observational Data. Proc. Fifthteenth Conference on Uncertainty in Artificial Intelligence (UAI '99). Edited by: Laskey K, Prade H. 1999, San Francisco, Calif.: Morgan Kaufman, 116-125.
Markowetz F, Grossmann S, Spang R: Probabilistic soft interventions in Conditional Gaussian networks. Proc Tenth International Workshop on Artificial Intelligence and Statistics. Edited by: Cowell R, Ghahramani Z. 2005
Yeang CH, Ideker T, Jaakkola T: Physical Network Models. Journal of Computational Biology. 2004, 11 (2): 243-262. 10.1089/1066527041410382.
Avery L, Wasserman S: Ordering gene function: the interpretation of epistasis in regulatory hierarchies. Trends Genet. 1992, 8 (9): 312-6. 10.1016/0168-9525(92)90263-4.
Driessche NV, Demsar J, Booth EO, Hill P, Juvan P, Zupan B, Kuspa A, Shaulsky G: Epistasis analysis with global transcriptional phenotypes. Nat Genet. 2005, 37 (5): 471-7. 10.1038/ng1545.
Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Ménard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J, Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S, Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP, Brown GW, Andrews B, Bussey H, Boone C: Global mapping of the yeast genetic interaction network. Science. 2004, 303 (5659): 808-813. 10.1126/science.1091317.
Wong SL, Zhang LV, Tong AHY, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, Boone C, Roth FP: Combining biological networks to predict genetic interactions. PNAS. 2004, 101 (44): 15682-15687. 10.1073/pnas.0406614101.
Drees BL, Thorsson V, Carter GW, Rives AW, Raymond MZ, Avila-Campillo I, Shannon P, Galitski T: Derivation of genetic interaction networks from quantitative phenotype data. Genome Biol. 2005, 6 (4): R38-10.1186/gb-2005-6-4-r38.
Markowetz F, Bloch J, Spang R: Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics. 2005, 21 (21): 4026-4032. 10.1093/bioinformatics/bti662.
Pournara I, Wernisch L: Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics. 2004, 20 (17): 2934-2942. 10.1093/bioinformatics/bth337.
Yoo C, Cooper GF: Evaluation of a System that Recommends Microarray Experiments to Perform to Discover Gene-Regulation Pathways. Artif Intell Med. 2004, 31 (2): 169-182. 10.1016/j.artmed.2004.01.018.
Yeang CH, Mak HC, McCuine S, Workman C, Jaakkola T, Ideker T: Validation and refinement of gene-regulatory pathways on a network of physical interactions. Genome Biol. 2005, 6: R62-10.1186/gb-2005-6-7-r62.
Tegner J, Yeung MKS, Hasty J, Collins JJ: Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA. 2003, 100 (10): 5944-9. 10.1073/pnas.0933416100.
Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD, Hood L: A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci USA. 2005, 102 (48): 17302-17307. 10.1073/pnas.0508649102.
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data. Science. 2003, 302 (5644): 449-453. 10.1126/science.1087361.
Papin JA, Hunter T, Palsson BO, Subramaniam S: Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol. 2005, 6 (2): 99-111. 10.1038/nrm1570.
Segal E, Friedman N, Kaminski N, Regev A, Koller D: From signatures to models: understanding cancer using microarrays. Nat Genet. 2005, 37 (Suppl): S38-45. 10.1038/ng1561.
Friedman N: Inferring Cellular Networks Using Probabilistic Graphical Models. Science. 2004, 303 (5659): 799-805. 10.1126/science.1094068.
Aittokallio T, Schwikowski B: Graph-based methods for analysing networks in cell biology. Brief Bioinform. 2006, 7 (3): 243-255. 10.1093/bib/bbl022.
Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR: Inference in Bayesian networks. Nat Biotechnol. 2006, 24: 51-3. 10.1038/nbt0106-51.
Carter GW: Inferring network interactions within a cell. Brief Bioinform. 2005, 6 (4): 380-389. 10.1093/bib/6.4.380.
Filkov V: Gene Network Inference From Large-Scale Expression Data. Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series). Edited by: Aluru S. 2005, Chapman & Hall/CRC
Pe'er D: Bayesian Network Analysis of Signaling Networks: A Primer. Science's STKE. 2005, 2005 (281): pl4.-10.1126/stke.2812005pl4.
Mandel J, Palfreyman NM, Lopez JA, Dubitzky W: Representing bioinformatics causality. Brief Bioinform. 2004, 5 (3): 270-83. 10.1093/bib/5.3.270.
De Jong H: Modeling and Simulation of Genetic Regulatory Systems: A Literature Review. Journal of Computational Biology. 2002, 9: 67-103. 10.1089/10665270252833208.
Wessels L, van Someren E, Reinders M: A comparison of genetic network models. Pac Symp Biocomput. 2001, 508-519.
D'haeseleer P, Liang S, Somogyi R: Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics. 2000, 16 (8): 707-726. 10.1093/bioinformatics/16.8.707.
Markowetz F: A bibliography on learning causal networks of gene interactions. 2006, [http://genomics.princeton.edu/~florian/]
This paper grew out of lectures given at the Institute for Theoretical Physics and Mathematics (IPM) in Tehran, Iran, in April 2005 and a tutorial given at the German Conference on Bioinformatics (GCB) in Hamburg, October 2005. F.M. thanks the organizers of both conferences for inviting him and giving him the opportunity to present. The authors thank Stefanie Scheid, Dennis Kostka and Patrick Bradley for their thorough readings of drafts of the manuscript.
This article has been published as part of BMC Bioinformatics Volume 8 Supplement 6, 2007: Otto Warburg International Summer School and Workshop on Networks and Regulation. The full contents of the supplement are available online at http://0-www.biomedcentral.com.brum.beds.ac.uk/1471-2105/8?issue=S6
About this article
Cite this article
Markowetz, F., Spang, R. Inferring cellular networks – a review. BMC Bioinformatics 8, S5 (2007). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-8-S6-S5
- Bayesian Network
- Directed Acyclic Graph
- Conditional Independence
- Coexpression Network
- Marginal Likelihood