- Research article
- Open Access
- Published:

# Predicting biological system objectives de novo from internal state measurements

*BMC Bioinformatics*
**volume 9**, Article number: 43 (2008)

## Abstract

### Background

Optimization theory has been applied to complex biological systems to interrogate network properties and develop and refine metabolic engineering strategies. For example, methods are emerging to engineer cells to optimally produce byproducts of commercial value, such as bioethanol, as well as molecular compounds for disease therapy. Flux balance analysis (FBA) is an optimization framework that aids in this interrogation by generating predictions of optimal flux distributions in cellular networks. Critical features of FBA are the definition of a biologically relevant objective function (e.g., maximizing the rate of synthesis of biomass, a unit of measurement of cellular growth) and the subsequent application of linear programming (LP) to identify fluxes through a reaction network. Despite the success of FBA, a central remaining challenge is the definition of a network objective with biological meaning.

### Results

We present a novel method called **Biological Objective Solution Search (BOSS)** for the inference of an objective function of a biological system from its underlying network stoichiometry as well as experimentally-measured state variables. Specifically, **BOSS** identifies a system objective by defining a putative stoichiometric "objective reaction," adding this reaction to the existing set of stoichiometric constraints arising from known interactions within a network, and maximizing the putative objective reaction via LP, all the while minimizing the difference between the resultant *in silico* flux distribution and available experimental (e.g., isotopomer) flux data. This new approach allows for discovery of objectives with previously unknown stoichiometry, thus extending the biological relevance from earlier methods. We verify our approach on the well-characterized central metabolic network of *Saccharomyces cerevisiae*.

### Conclusion

We illustrate how **BOSS** offers insight into the functional organization of biochemical networks, facilitating the interrogation of cellular design principles and development of cellular engineering applications. Furthermore, we describe how growth is the best-fit objective function for the yeast metabolic network given experimentally-measured fluxes.

## Background

Systems-based approaches coupled with experimental data have facilitated greater understanding of large-scale biological systems [1, 2]. For example, optimization procedures have recently been used to characterize systemic properties in biology, including phenotypic properties like growth rates and effects of gene knockouts [3–7]. One quantitative measure of a biological phenotype is the set of fluxes through all reactions within a biochemical network [8]. Specifically, flux balance analysis (FBA) is a constraints-based approach that calculates steady-state flux distributions [9–11].

FBA has traditionally been based on the premise that prokaryotes such as *Escherichia coli* have maximized their growth performance as a response to selective pressure [12]. Consequently, a common objective function in FBA of metabolic networks is the maximization of the rate of synthesis of biomass, a unit of measurement of cellular growth. However, as other types of networks and higher-order systems are interrogated, other objectives may be more accurate in predicting phenotypes. For example, other objective functions that have been previously considered in FBA include optimization of energy production or consumption [13] and byproduct synthesis [14]. By inferring objective functions of biological systems, cellular design principles may be studied and systems may be exploited for engineering of metabolic byproducts of commercial or medical value [15–20].

*In silico* frameworks for determining a most-likely objective function have previously been proposed. One such tool, named **ObjFind**, attempts to identify weightings, termed coefficients of importance (CoIs), on reaction fluxes within a network while minimizing the difference between the resultant flux distribution and known experimental fluxes [3]. In the **ObjFind** framework, a high CoI indicates a reaction that is more likely a component of the cellular objective function, given available experimental fluxes. However, **ObjFind** is unable to *a priori* define objectives, since in FBA the objective function is defined as a single reaction within the system (and represented within the stoichiometric matrix) and not a weighting on multiple reactions (i.e., a set of CoIs). For example, if the true objective reaction has not been experimentally characterized and is not included within the network reconstruction, **ObjFind** is unable to assign the highest CoI to it and instead chooses an alternate (and consequently suboptimal) reaction or set of reactions as constituting the objective function. Two recent efforts have further attempted to identify the most probable objective of a metabolic system from a set of possible objectives, in one case via a Bayesian-based probability ranking [21] and in the other case using an Euclidean metric [20]. However, like **ObjFind**, each of these methods requires that the stoichiometric network reconstruction include the true objective function as an existing reaction in order to yield meaningful predictions.

We present a novel framework, **Biological Objective Solution Search** (**BOSS**), for identifying objective functions of biological systems based on the stoichiometry of the underlying biochemical network(s) and known experimental flux data. In this framework, the biological objective function is a *de novo* reaction (column) that is added to the matrix **S** representing the stoichiometry of the underlying system. Subsequently, the flux through this particular objective reaction is optimized (maximized) as the objective function using the standard FBA approach. Notably, the objective reaction is not confined to be one of a subset of existing reactions, but rather is allowed to take on any form (e.g., an existing reaction, a combination of existing reactions, or a previously uncharacterized reaction) as specified by an optimization procedure. It is, however, confined to be *one* linear stoichiometric reaction whose coefficients are determined by the framework.

To illustrate this novelty of **BOSS** over existing methods (including the three described above), consider the simple system drawn in Figure 1(a). This system is comprised of three components and five reactions. The objective reaction is denoted as *r*_{obj}. Assume that a network reconstruction based on available literature includes all three components and four of the five reactions but excludes the actual system objective, *r*_{obj}, governing the resultant flux distribution. As shown in Figure 1(b), analyzing the network with an approach like **ObjFind** generates a rank-ordered list of the reactions that best describe the objective function. However, since *r*_{obj} is unknown, the calculated objective will be some combination of weightings on *r*_{1} through *r*_{4}, and the approach will fundamentally fail to capture the actual objective of the system *r*_{obj}. However, as shown in Figure 1(c), by generating a *de novo* stoichiometric reaction as its objective function, **BOSS** would in theory be able to recapitulate the actual system objective.

We applied **BOSS** to the previously reconstructed *S. cerevisiae* central metabolic network [22, 23] (see Additional file 1 for details of the network) to evaluate which objective reaction it infers for that system given a set of isotopomer flux data [23, 24]. We compared the **BOSS**-derived objective reaction to the hypothesized system objective of precursor biomass synthesis. We considered two cases, one in which the hypothesized objective reaction (i.e., precursor biomass synthesis) was excluded from the system and another in which it was included within the set of known stoichiometric reactions. We further assessed how **BOSS** handles noise in experimental flux distributions. Finally, we compared our results to a negative control in which flux distributions comprised of randomly generated values were inputted into **BOSS**. Ultimately, we illustrate how **BOSS** extends existing objective function identification tools by inferring the objective reaction of a biochemical network *de novo* from internal state measurements, and how this tool therefore facilitates future study of cellular design principles and cellular engineering approaches.

## Methods

The framework that we present for identifying a putative objective function for a given biological system constitutes a single-level optimization problem. Specifically, **BOSS** integrates network stoichiometry, physico-chemical constraints such as reaction bounds, and experimental flux data to generate a novel stoichiometric reaction corresponding to the most likely objective function of the system. The framework and implementation strategies are described here.

### Flux Balance Analysis (FBA)

A key feature in the application of FBA is the optimization of an objective function subject to fundamental constraints on cellular function [9, 11, 25]. FBA requires a stoichiometric network reconstruction, usually represented as a stoichiometric matrix, **S**, in which components are delineated as rows, component interactions as columns, and stoichiometric relationships as coefficients (see Figure 1 for examples of biochemical networks and the corresponding **S** matrices). This reconstruction, coupled with reaction fluxes, comprises the principal constraint in FBA, i.e., at steady-state, for each component, the sum of the stoichiometric coefficients multiplied by the corresponding reaction fluxes must equal zero to ensure that mass is balanced within the system.

Commonly used objective functions (i.e., stoichiometric objectives or "objective reactions") in FBA of metabolic networks include biomass production [26, 27], energy production or consumption [13], and byproduct production [14]. The objective function used in FBA is assumed to be linear by a first-order approximation, since this form simplifies computation and reduces the number of parameters to be defined in the objective experimentally. Changes in a linear objective under varying growth conditions have been observed [12].

FBA-predicted flux distributions have displayed agreement with experimentally-measured flux data in some cases [12]. Furthermore, FBA has yielded insights into optimal targets for metabolic engineering [15] and adaptive evolution of *E. coli* strains [28], among other applications [29]. Additionally, FBA has successfully predicted metabolic phenotypes of cellular systems by hypothesizing biomass production as the objective function [7, 9, 10, 12, 19, 27, 30–36]. However, other objectives may be equally successful at predicting phenotypes for particular conditions [20]. Furthermore, a metabolic objective may change at different stages of an organism's life cycle [37]. Some studies suggest a greater role for environmental factors in determining cellular objectives than is assumed when FBA is employed with a biomass production objective [38]. Ultimately, as additional network reconstructions are completed and experimental flux data are obtained, there is increasing interest in determining realistic objective functions to explain systemic behavior [18–20, 39].

### Formulation of the Objective Function-Finding Algorithm BOSS

The **BOSS** framework initially takes the form of a bi-level optimization problem that minimizes the sum-squared error between experimentally-measured (*in vivo*) fluxes and framework-computed (*in silico*) fluxes ("outer problem") (see Figure 2(a), line 1), subject to the condition that a cellular objective is simultaneously maximized ("inner problem") (lines 2–5). The inner problem takes the canonical form of a FBA problem, wherein an objective reaction introduced to the network is maximized (line 2) subject to thermodynamic (line 3), mass balance (line 4), and uptake (line 5) constraints. The coefficients of the objective reaction are unknown and part of the solution space.

In order to make this bi-level optimization problem computationally tractable, we reformulated it as a single-level optimization problem via the duality theorem of LP, as previously described [3, 40] (see Figure 2(b)). Specifically, the revised form is comprised of an objective that aims to minimize the sum-squared error between experimentally-measured and framework-computed fluxes (line 1) subject to a set of primal (lines 3 through 5) and dual constraints (lines 7 through 9). The objective in line 1 is equivalent to the outer objective of the bi-level optimization problem (see Figure 2(a), line 1), and the primal constraints in lines 3 through 5 are equivalent to the constraints in the original bi-level optimization problem (see Figure 2(a), lines 3 through 5). Two additional constraints are included in the single-level optimization framework. Specifically, the value of the primal and dual problems are set equivalent to one another (line 2), and the flux distribution corresponding to the new objective reaction is normalized to a specific value (line 6). As described in the caption for Figure 2, the decision variables in this optimization are: (1) the stoichiometric coefficients of the objective reaction, *S*_{i, objective}where *i* ∈ **N**, or the set of metabolites, and "objective" denotes the new objective reaction; (2) the framework-computed fluxes **v**^{experimental}; (3) the dual variable *g* associated with the uptake constraint; and (4) the dual variables **u** indicating shadow prices on the mass balance constraints for each metabolite in the system.

Due to the large number of decision variables, large solution space, existence of multiple local optima, and inherent non-convexity of the problem, we instituted a multiple restart approach, wherein the optimization is run multiple times, each time with different randomly-selected starting values for the decision variables (within a specified range) (see Figure 3(b), item 3). The output of this approach is a series of putative objective reactions, one for each restart. Objective reactions that yield a poor value in the outer optimization (i.e., for which the sum-squared error between experimental and calculated fluxes is relatively high) are removed from the solution set. The remaining putative objective reactions are then clustered into groups based on similarity, and the most populous cluster is chosen as the consensus objective reaction (see Figure 3(b), item 4, as well as Additional file 2). This last step seeks to eliminate solutions to the outer problem that constitute suboptimal local minima. It is assumed that the global minimum to the outer problem (i.e., the best match between **BOSS**-derived and experimental fluxes, thereby leading to the likely objective reaction) will draw from a larger portion of the initial state space than any local minimum, and thus will gather the greatest proportion of the restarts when solutions are clustered.

This single-level optimization with multiple restarts, followed by clustering and subsequent averaging of the most populous cluster, ultimately yields a stoichiometrically-weighted reaction for which the metabolic network is optimized. This reaction represents a best hypothesis for the network objective function based on available data about the system. Determination of the objective reaction in this manner offers a means to gain insight about network and sub-network objectives and behavior.

### Biological System Evaluated: The *S. cerevisiae* Central Metabolic Network

The previously reconstructed *S. cerevisiae* central metabolic network [22, 23] was used to assess the **BOSS** framework (see Additional file 2). This network is a subset of the genome-scale *S. cerevisiae* metabolic reconstruction [41, 42] and comprises the major carbohydrate metabolism pathways of glycolysis, pentose phosphate, and the citrate cycle, the principal energy metabolism pathway of oxidative phosphorylation, and a precursor biomass synthesis reaction. The network is comprised of 60 metabolites participating in 62 reactions, including six exchange reactions, 55 intracellular reactions, and one precursor biomass synthesis reaction.

A flux distribution for the central metabolic network was previously characterized experimentally [22, 23]. This distribution was obtained by GC-MS tracing of ^{13}C-labeled isotopomers, and corresponds to yeast cells cultivated in batch culture with glucose as the limiting substrate and at a maximum specific growth rate, *μ*_{max}, of 0.37 h^{-1}. (See [23] for complete experimental details.) The **BOSS** framework was applied to this data set.

In addition, the precursor biomass synthesis reaction, which balances the major metabolites from central metabolism that contribute to biomass, was hypothesized to be the "true" objective function of the system for testing the **BOSS** framework. This hypothesis was based on the underlying premise described previously that organisms have maximized their growth performance through natural selection [12]. The precursor biomass synthesis reaction was previously characterized experimentally [22, 23]. It is important to note that **BOSS** does not require an assumption of an objective function, and we demonstrate below the success of **BOSS** at generating a hypothesized objective function *de novo*.

### Implementation Details

The general implementation scheme for **BOSS** is illustrated in Figure 3. Specifically, we inputted the stoichiometric network reconstruction and isotopomer flux data for the *S. cerevisiae* central metabolic network into **BOSS** (see Figure 3(a)) and assessed the objective reaction that it derived (see Figure 3(b)). We evaluated two conditions, one in which the hypothesized objective function of precursor biomass synthesis was included in the network reconstruction inputted into **BOSS** and another in which it was excluded. In the second case, **BOSS** yielded a zero-vector for the objective reaction, suggesting that the true objective was already a part of the network reconstruction. Therefore, we altered **BOSS** slightly to identify which of the reactions was the likely objective: we removed, one by one, each reaction flux from the set of experimental flux data (**v**^{experimental} in Figure 2), and identified in each case the sum-squared error between the **BOSS**-derived objective reaction and the reaction corresponding to the removed flux. The reaction that exhibited the smallest sum-squared error through this method was identified as the objective function. This approach was utilized to ensure that, if the actual system objective is already in the stoichiometric network reconstruction, the outer optimization problem of **BOSS** does not force flux to go through that reaction and rather allows the flux corresponding to the new objective function column to be maximized as part of **BOSS**.

We considered one additional *in silico* experiment to evaluate how well our framework handles the kind of variability that is characteristic of actual flux measurements. Specifically, we assessed how a varying amount of noise introduced to the set of experimental flux data affects the objective function derived by **BOSS**. (See Additional file 3 as well for a discussion of how **BOSS** handles a paucity of flux data, as applied to a prototypic system.) To perform this test, additional steps were performed during data preparation (see Figure 3(a), item 2). First, each experimental flux for the *S. cerevisiae* metabolic network was varied randomly via a normal distribution, with standard deviation equal to a set percent of the initial value of the flux. Different percent variances, ranging from five percent to 25 percent in increments of five percent, were considered. Fluxes with an initial value of zero were varied with a standard deviation equal to that of the smallest nonzero flux in the system. **BOSS** was fed these varied fluxes as "experimental" fluxes. To avoid biasing results toward one particular variant of the "experimental" fluxes, we repeated multiple times this process of randomly varying the experimental flux distribution. In other words, for any given percent variance, we tested multiple noisy flux data sets (n = 20). Consequently, when considering the results of any **BOSS** run on noisy flux data, we evaluated the coefficients of the objective reaction for each noisy flux data set independently.

#### Assessing the Results of BOSS

To assess the accuracy and efficacy of our **BOSS**-derived results, we considered two statistical measures. First, we computed the sum-squared error between the coefficients of the **BOSS**-derived objective reaction and the coefficients of the hypothesized objective reaction (in this case, precursor biomass synthesis), normalized to the magnitude of the hypothesized objective reaction column vector. This term is abbreviated SSE_{
S
}throughout the manuscript. Second, we computed the sum-squared error between the reaction fluxes outputted by **BOSS** and the experimental fluxes inputted to the framework and specified as constraints. This term, which is equivalent to the function being minimized in the **BOSS** outer optimization problem, is abbreviated SSE_{
v
}throughout the manuscript. Ideally, smaller values of SSE_{
v
}should correspond to smaller values of SSE_{
S
}, as smaller values of SSE_{
S
}imply more accurate objective reactions.

#### Technical Implementation

Simulations of **BOSS** were run with GAMS v.22.5 on Microsoft Windows XP Professional and Linux (Centos5) machines. The MINOS5 NLP (non-linear programming) solver was utilized. Results were analyzed using code written in MATLAB v.7.5 (part of the MathWorks R2007b release package), with clustering functionality provided by the MATLAB Statistics Toolbox (as detailed in Additional file 2).

## Results and Discussion

The *S. cerevisiae* central metabolic network was used as an experimental system to evaluate the ability of **BOSS** to identify objective functions of biological systems. Additionally, the network was used to evaluate how well the framework handles noise in flux distributions, which is a characteristic of actual experimental measurements. These results were contrasted with randomly-generated flux data for the *S. cerevisiae* central metabolic network as a further validation of the **BOSS** framework. We summarize the results here.

### Validation of Objective Function Identification

In inferring the objective function for the *S. cerevisiae* metabolic network with **BOSS**, two conditions were evaluated. In the first one, the hypothesized objective reaction (i.e., precursor biomass synthesis) was excluded from the stoichiometric network reconstruction. **BOSS** identified coefficients approximately equal to that of the precursor biomass synthesis reaction (see Figure 4(a)). The sum-squared error between the **BOSS**-computed objective reaction and the experimentally-derived precursor biomass synthesis reaction, normalized to the magnitude of the precursor biomass synthesis reaction (SSE_{
S
}), was 8.242 × 10^{-29}. The magnitude of this SSE_{
S
}suggests that **BOSS** derives an objective reaction equivalent to the precursor biomass synthesis reaction, well within the limits of numerical tolerance.

We also implemented **BOSS** on the complete *S. cerevisiae* central metabolic network, i.e., without removing the precursor biomass synthesis reaction. When the precursor biomass synthesis reaction was thus included in the set of stoichiometric reactions, **BOSS** yielded a zero-vector for the objective reaction (SSE_{
S
}= 655.0), suggesting that the actual system objective was already a part of the stoichiometric network reconstruction (see Figure 4(b)). Subsequently, we individually removed each of the reaction fluxes from the pool of experimental fluxes defined in the outer optimization problem of **BOSS** (i.e., **v**^{experimental} in Line 1 of Figure 2(a)) and observed the resultant stoichiometric coefficients that **BOSS** derived for the objective function. SSE_{
S
}values between the reaction whose flux was removed from **v**^{experimental} and the **BOSS**-derived objective reaction given removal of that flux were compared for each reaction in the system (see Figure 4(c)). The reaction with the smallest SSE_{
S
}(between the **BOSS**-derived objective and the experimentally-characterized reaction whose flux was excluded) was the hypothesized system objective of precursor biomass synthesis (SSE_{
S
}= 4.210 × 10^{-4}). This result further validated the ability of **BOSS** to identify objective reactions with previously known as well as unknown stoichiometry (see Figures 4(c) and 4(d)). Interestingly, the reaction with the next smallest SSE_{
S
}(= 42.20) was the ATP maintenance reaction, a reaction that has also been evaluated as an objective of metabolic networks under some conditions [3, 21, 43]. All other reactions in the network had SSE_{
S
}values that were at least an order of magnitude higher. This key result is also a direct validation of the hypothesis that biomass production is a reasonable objective function for many metabolic networks; with the experimentally-generated flux data, **BOSS** derived an objective function highly correlated with biomass production although there was no such assumption *a priori*.

It is important to note that application of FBA to the *S. cerevisiae* metabolic network does *not* yield the experimental flux distribution measured by [23]. Because FBA problems have large convex solution spaces, many different flux distributions yield equally optimal results. Therefore, it is significant that **BOSS** is able to infer the correct coefficients for the precursor biomass synthesis reaction in spite of this difference. Furthermore, although maximization of biomass synthesis is hypothesized as the objective of a metabolic network, the experimental flux data are calculated based on isotopomer measurements and the underlying system stoichiometry but is not biased toward this objective. Thus, the characterization of biomass as the system objective by **BOSS** is a novel validation of this theoretical assumption.

### Accounting for Limitations in Experimental Measurements: Noisy Flux Data

To account for current limitations in experimental (isotopomer) technologies for measuring reaction fluxes, the *S. cerevisiae* central metabolic network was evaluated by **BOSS** when different levels of noise were introduced into the experimental flux data. Different percentages of deviation for the flux data were evaluated to determine how quickly the accuracy of the **BOSS**-computed objective reaction degrades with noisy data.

The results for the *S. cerevisiae* network when varying levels of noise are introduced into the individual flux values are illustrated in Figure 5. For the purposes of this evaluation, based on the results described above, the hypothesized objective reaction of precursor biomass synthesis was included as part of the underlying network stoichiometry inputted into **BOSS**, and the set of experimental fluxes consisted of all fluxes except for the flux corresponding to this reaction. Figure 5(a) illustrates the normalized sum-squared error between the **BOSS**-derived objective reaction and the hypothesized objective function of precursor biomass synthesis (SSE_{
S
}) for different percent variances in experimental flux data, ranging from five percent to 25 percent in increments of five percent. (As described above, for each increment, 20 unique "initial" flux distributions, each with the same level of "noise," were computed, and the results for each of these 20 flux distributions were treated independently of the others. Therefore, the plot in Figure 5(a) illustrates the mean SSE_{
S
}across these 20 distributions, as well as the associated standard deviation within SSE_{S} across these 20 distributions.) Up to about 15 percent variance in the flux data, SSE_{
S
}< 1000, i.e., **BOSS** is able to generate reconstructions of the hypothesized objective reaction orders of magnitude more accurately than when random flux data are inputted into **BOSS** (see "Control Case: Random Flux Data" below). As expected, with increasing noise, the results of **BOSS** exhibit increasing mean SSE_{
S
}values and the variability of SSE_{
S
}across different initial flux sets is higher. These data support the utility of our framework in identifying objective functions for biological systems given the current state of experimental flux measurements.

To further highlight **BOSS**'s performance on noisy flux data, snapshots of the objective reactions that it generated at different flux variances are illustrated. Specifically, Figures 5(b), 5(c), and 5(d) correspond to the coefficients identified by **BOSS** (blue error bars) overlaid on the precursor biomass synthesis reaction (gray bars) for flux variances of 5, 15, and 25 percent, respectively. (Note that, for each flux variance, 20 unique "initial" flux distributions were computed. Consequently, the values for the coefficients derived by **BOSS** correspond to a representative result for one of these 20 initial flux sets.) For smaller flux variances, the pattern of the coefficients identified by **BOSS** follows that of the hypothesized network objective of precursor biomass synthesis that was used to generate the noisy flux data (see Figure 5(b)). As expected, the more noisy the flux data, the less precise the objective reaction identified by the framework (see Figure 5(c)).

It is important to note that this analysis involves artificially adding noise to an experimental flux distribution. Although the original experimental fluxes are based on a steady-state metabolic model of yeast central metabolism, it is likely that there is some degree of noise already within the data set as part of the experimental protocol. Consequently, the studies with noise presented here include even higher degrees of noise than the five to 25 percent spectrum evaluated. Nevertheless, the results suggest that **BOSS** performs well, to an extent, with noisy experimental data.

#### Control Case: Random Flux Data

To place these results in context, randomly-generated flux values coupled with the underlying network stoichiometry for the *S. cerevisiae* central metabolic network were inputted into the **BOSS** framework as a negative control. These fluxes were generated by specifying a normal distribution statistically similar to the characteristics of the experimental flux distribution. Specifically, the randomly-generated fluxes were centered at a mean of 6.697 mmol/g·h with a standard deviation of 9.521 mmol/g·h, identical to the mean and standard deviation of the experimental fluxes reported in [23, 24]. For this test of random flux data, no reaction was removed from the stoichiometric network reconstruction inputted into **BOSS** to avoid biasing results in any way toward a particular objective. However, to ensure that flux was driven through the objective reaction, thus minimizing SSE_{
v
}(see discussion above pertaining to Figures 4(b), 4(c), and 4(d)), the flux for the precursor biomass synthesis objective reaction was not included in the pool of experimental flux data inputted into **BOSS**, as described earlier. A total of 100 randomly-generated flux distributions, each with mean and standard deviation consistent with the experimental flux data, were evaluated, and the SSE_{
S
}between the hypothesized objective reaction of precursor biomass synthesis and the **BOSS**-derived objective reaction was 2.076 × 10^{4}. In addition, the SSE_{
v
}was 962.9. These results are orders of magnitude higher than when actual experimental flux data, or even noisy experimental data, are specified (see "Validation of Objective Function Identification" above), thus further confirming the utility of **BOSS** in inferring objective functions of biological systems with internal state measurements.

## Conclusion

We present a novel framework called **Biological Objective Solution Search** (**BOSS**) that integrates network stoichiometry and experimental flux data to determine the most likely objective function for a given biological system. We illustrate the utility of **BOSS** on a model of *S. cerevisiae* central metabolism with a variety of input conditions, including experimental (isotopomer) flux data, varied experimental flux data meant to mimic noise in experimental measurements, and randomly-generated fluxes. When the underlying network stoichiometry and all reaction fluxes are specified, **BOSS** identifies an objective reaction with a normalized sum-squared error between the computed objective and the hypothesized objective (i.e., precursor biomass synthesis) (SSE_{
S
}) of 4.210 × 10^{-4}. Additionally, when noise is introduced into the flux data to simulate the types of error observed in experimental flux measurements, **BOSS** identifies the objective reaction with a SSE_{
S
}of less than 1000 for up to 15 percent experimental flux noise. Thus, we show how **BOSS** can integrate network stoichiometry and internal state measurements, including in the case of noisy data, to predict the objective function of a biological system.

Interestingly, researchers have theorized that, through evolution, metabolic systems have optimized for biomass production [12]. Here we illustrate that our framework derives this hypothesized objective reaction, precursor biomass synthesis, based on the underlying network stoichiometry and a set of experimentally-measured flux data. This observation validates the **BOSS** framework and it further confirms the precursor biomass synthesis reaction as an objective of yeast central metabolism.

Current challenges remain in the implementation of the **BOSS** algorithm. The transformation of a bi-level to a single-level optimization via the strong duality theorem (see "Methods") introduces a high degree of nonlinearity into the problem and non-convexity into the solution space, rendering the **BOSS** algorithm difficult for nonlinear solvers to tackle. Further, the structure of the inner and outer problems suggests that the solution space for a **BOSS** problem is not smooth, but rather likely contains discontinuous rifts in the value of the optimization parameter (SSE_{
v
}) as one moves smoothly through the state space. These factors complicate the ability of **BOSS** to determine a global optimum with a small number of restarts, resulting in several hours to days of simulation time depending on the system and input conditions. Perhaps stronger nonlinear solvers, more powerful dedicated computers, and more run-time would yield better predictions of objective functions. These challenges will need to be addressed as **BOSS** is scaled up to genome-scale applications.

Another hurdle to the meaningful application of **BOSS** is the paucity of large-scale experimentally-derived flux sets in literature. Isotopomer studies, as well as ^{13}C-constrained fluxomics studies, tend to use simplified models of central metabolism that lump reactions together and thus fail to define a distinct flux for each reaction in the system [44–47]. These simplifications have been necessary due to experimental and computational hurdles, but they limit the degree to which experimental flux data can be used by **BOSS**. As more powerful fluxomics techniques are developed, more complete flux distribution maps will become available for analysis by **BOSS** [48].

**BOSS** is distinguished from preexisting objective-searching algorithms in its ability to define an objective reaction absent previous knowledge of the stoichiometric structure of the objective function [3, 21]. This attribute could prove useful in analyzing signaling and transcriptional regulatory networks, as their objectives are likely to be more loosely defined and poorly understood than those of metabolic networks [32, 49]. Likewise, studying objectives of multiple metabolically-interacting species or compartmentalized metabolic processes (e.g., mitochondrial metabolism) offers fruitful challenges that cannot be described as simply biomass production. Such analyses could lead to a greater appreciation of the driving evolutionary forces that govern these cellular processes, and could help explain how the whole-cell objective originates. A framework that predicts objectives of a biological system based on known properties of the system (e.g., the stoichiometry of the system and associated reaction fluxes) thus has the potential to address many critical and timely questions in biology.

## References

- 1.
Stelling J: Mathematical models in microbial systems biology.

*Curr Opin Microbiol*2004, 7(5):513–518. 10.1016/j.mib.2004.08.004 - 2.
Klamt S, Stelling J: Two approaches for metabolic pathway analysis?

*Trends Biotechnol*2003, 21(2):64–69. 10.1016/S0167-7799(02)00034-3 - 3.
Burgard AP, Maranas CD: Optimization-based framework for inferring and testing hypothesized metabolic objective functions.

*Biotechnol Bioeng*2003, 82(6):670–677. 10.1002/bit.10617 - 4.
Heinrich R, Schuster S, Holzhutter HG: Mathematical analysis of enzymic reaction systems using optimization principles.

*Eur J Biochem*1991, 201(1):1–21. 10.1111/j.1432-1033.1991.tb16251.x - 5.
Maynard Smith J: Optimization Theory in Evolution.

*Ann Rev Ecol Syst*1978, 9: 31–56. 10.1146/annurev.es.09.110178.000335 - 6.
Deutscher D, Meilijson I, Kupiec M, Ruppin E: Multiple knockout analysis of genetic robustness in the yeast metabolic network.

*Nat Genet*2006, 38(9):993–998. 10.1038/ng1856 - 7.
Mahadevan R, Edwards JS, Doyle FJ 3rd: Dynamic flux balance analysis of diauxic growth in Escherichia coli.

*Biophys J*2002, 83(3):1331–1340. - 8.
Sauer U: Metabolic networks in motion: 13C-based flux analysis.

*Mol Syst Biol*2006, 2: 62. 10.1038/msb4100109 - 9.
Lee JM, Gianchandani EP, Papin JA: Flux balance analysis in the era of me-tabolomics.

*Brief Bioinform*2006, 7(2):140–150. 10.1093/bib/bbl007 - 10.
Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis.

*Curr Opin Biotechnol*2003, 14(5):491–496. 10.1016/j.copbio.2003.08.001 - 11.
Price ND, Reed JL, Palsson BO: Genome-scale models of microbial cells: evaluating the consequences of constraints.

*Nat Rev Microbiol*2004, 2(11):886–897. 10.1038/nrmicro1023 - 12.
Segre D, Vitkup D, Church GM: Analysis of optimality in natural and perturbed metabolic networks.

*Proc Natl Acad Sci USA*2002, 99(23):15112–15117. 10.1073/pnas.232349399 - 13.
Ramakrishna R, Edwards JS, McCulloch A, Palsson BO: Flux-balance analysis of mitochondrial energy metabolism: consequences of systemic stoichiometric constraints.

*Am J Physiol Regul Integr Comp Physiol*2001, 280(3):R695–704. - 14.
Varma A, Boesch BW, Palsson BO: Stoichiometric interpretation of Escherichia coli glucose catabolism under various oxygenation rates.

*Appl Environ Microbiol*1993, 59(8):2465–2473. - 15.
Bro C, Regenberg B, Forster J, Nielsen J: In silico aided metabolic engineering of Saccharomyces cerevisiae for improved bioethanol production.

*Metab Eng*2006, 8(2):102–111. 10.1016/j.ymben.2005.09.007 - 16.
Ro DK, Paradise EM, Ouellet M, Fisher KJ, Newman KL, Ndungu JM, Ho KA, Eachus RA, Ham TS, Kirby J, Chang MC, Withers ST, Shiba Y, Sarpong R, Keasling JD: Production of the antimalarial drug precursor artemisinic acid in engineered yeast.

*Nature*2006, 440(7086):940–943. 10.1038/nature04640 - 17.
Raman K, Rajagopalan P, Chandra N: Flux balance analysis of mycolic acid pathway: targets for anti-tubercular drugs.

*PLoS Comput Biol*2005, 1(5):e46. 10.1371/journal.pcbi.0010046 - 18.
Bailey JE: Toward a science of metabolic engineering.

*Science*1991, 252(5013):1668–1675. 10.1126/science.2047876 - 19.
Patil KR, Akesson M, Nielsen J: Use of genome-scale microbial models for metabolic engineering.

*Curr Opin Biotechnol*2004, 15(1):64–69. 10.1016/j.copbio.2003.11.003 - 20.
Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli.

*Mol Syst Biol*2007, 3: 119. 10.1038/msb4100162 - 21.
Knorr AL, Jain R, Srivastava R: Bayesian-based selection of metabolic objective functions.

*Bioinformatics*2006. - 22.
Wang L, Hatzimanikatis V: Metabolic engineering under uncertainty – II: analysis of yeast metabolism.

*Metab Eng*2006, 8(2):142–159. 10.1016/j.ymben.2005.11.002 - 23.
Gombert AK, Moreira dos Santos M, Christensen B, Nielsen J: Network identification and flux quantification in the central metabolism of Saccharomyces cerevisiae under different conditions of glucose repression.

*J Bacteriol*2001, 183(4):1441–1451. 10.1128/JB.183.4.1441-1451.2001 - 24.
dos Santos MM, Gombert AK, Christensen B, Olsson L, Nielsen J: Identification of in vivo enzyme activities in the cometabolism of glucose and acetate by Saccharomyces cerevisiae by using 13C-labeled substrates.

*Eukaryot Cell*2003, 2(3):599–608. 10.1128/EC.2.3.599-608.2003 - 25.
Papin JA, Palsson BO: The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis.

*Biophys J*2004, 87(1):37–46. 10.1529/biophysj.103.029884 - 26.
Varma A, Palsson BO: Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110.

*Appl Environ Microbiol*1994, 60(10):3724–3731. - 27.
Edwards JS, Palsson BO: The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities.

*Proc Natl Acad Sci USA*2000, 97(10):5528–5533. 10.1073/pnas.97.10.5528 - 28.
Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, Pals-son BO: In silico design and adaptive evolution of Escherichia coli for production of lactic acid.

*Biotechnol Bioeng*2005, 91(5):643–648. 10.1002/bit.20542 - 29.
Trawick JD, Schilling CH: Use of constraint-based modeling for the prediction and validation of antimicrobial targets.

*Biochem Pharmacol*2006, 71(7):1026–1035. 10.1016/j.bcp.2005.10.049 - 30.
Edwards JS, Ibarra RU, Palsson BO: In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data.

*Nat Biotechnol*2001, 19(2):125–130. 10.1038/84379 - 31.
Edwards JS, Palsson BO: Robustness analysis of the Escherichia coli metabolic network.

*Biotechnol Prog*2000, 16(6):927–939. 10.1021/bp0000712 - 32.
Klamt S, Stelling J, Ginkel M, Gilles ED: FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps.

*Bioinformatics*2003, 19(2):261–269. 10.1093/bioinformatics/19.2.261 - 33.
Henry CS, Jankowski MD, Broadbelt LJ, Hatzimanikatis V: Genome-scale thermodynamic analysis of Escherichia coli metabolism.

*Biophys J*2006, 90(4):1453–1461. 10.1529/biophysj.105.071720 - 34.
Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Barabasi AL: Global organization of metabolic fluxes in the bacterium Escherichia coli.

*Nature*2004, 427(6977):839–843. 10.1038/nature02289 - 35.
Bonarius H, Schmid G, Tramper J: Flux analysis of underdetermined metabolic networks: The quest for the missing constraints.

*TRENDS BIOTECHNOL*1997, 15(8):308–314. 10.1016/S0167-7799(97)01067-6 - 36.
Blank LM, Kuepfer L, Sauer U: Large-scale 13C-flux analysis reveals mechanistic principles of metabolic network robustness to null mutations in yeast.

*Genome Biol*2005, 6(6):R49. 10.1186/gb-2005-6-6-r49 - 37.
Gupta S: Parasite immune escape: new views into host-parasite interactions.

*Curr Opin Microbiol*2005, 8(4):428–433. 10.1016/j.mib.2005.06.011 - 38.
Pfeiffer T, Schuster S: Game-theoretical approaches to studying the evolution of biochemical systems.

*Trends Biochem Sci*2005, 30(1):20–25. 10.1016/j.tibs.2004.11.006 - 39.
Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED: Metabolic network structure determines key aspects of functionality and regulation.

*Nature*2002, 420(6912):190–193. 10.1038/nature01166 - 40.
Intriligator MD:

*Mathematical Optimization and Economic Theory.*1st edition. Philadelphia: Society for Industrial and Applied Mathematics; 2002. - 41.
Herrgard MJ, Lee BS, Portnoy V, Palsson BO: Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae.

*Genome Res*2006, 16(5):627–635. 10.1101/gr.4083206 - 42.
Duarte NC, Herrgard MJ, Palsson BO: Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model.

*Genome Res*2004, 14(7):1298–1309. 10.1101/gr.2250904 - 43.
Ibarra RU, Edwards JS, Palsson BO: Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth.

*Nature*2002, 420(6912):186–189. 10.1038/nature01149 - 44.
Emmerling M, Dauner M, Ponti A, Fiaux J, Hochuli M, Szyperski T, Wuthrich K, Bailey JE, Sauer U: Metabolic flux responses to pyruvate kinase knockout in Escherichia coli.

*J Bacteriol*2002, 184(1):152–164. 10.1128/JB.184.1.152-164.2002 - 45.
Li M, Ho PY, Yao S, Shimizu K: Effect of lpdA gene knockout on the metabolism in Escherichia coli based on enzyme activities, intracellular metabolite concentrations and metabolic flux analysis by 13C-labeling experiments.

*J Biotechnol*2006, 122(2):254–266. 10.1016/j.jbiotec.2005.09.016 - 46.
Zhao J, Baba T, Mori H, Shimizu K: Effect of zwf gene knockout on the metabolism of Escherichia coli grown on glucose or acetate.

*Metab Eng*2004, 6(2):164–174. 10.1016/j.ymben.2004.02.004 - 47.
Iwatani S, Van Dien S, Shimbo K, Kubota K, Kageyama N, Iwahata D, Miyano H, Hirayama K, Usuda Y, Shimizu K, Matsui K: Determination of metabolic flux changes during fed-batch cultivation from measurements of intracellular amino acids by LC-MS/MS.

*J Biotechnol*2007, 128(1):93–111. 10.1016/j.jbiotec.2006.09.004 - 48.
Reed JHQ, Palsson B: Sensitivity of 13C Isotopomer Based Flux Calculations to Metabolic Network Scope.

*In Review*2007. - 49.
Klamt S, Saez-Rodriguez J, Lindquist JA, Simeoni L, Gilles ED: A methodology for the structural and functional analysis of signaling and regulatory networks.

*BMC Bioinformatics*2006, 7: 56. 10.1186/1471-2105-7-56

## Acknowledgements

We thank Dr. Kyongbum Lee and Ryan P. Nolan for kindly sharing code with us, as well as Dr. Jong Min Lee, Dr. Kenneth Crowther, Arvind Chavali, Drake Guenther, and Michael Ellis for insightful comments and feedback. Additionally, we thank the National Institutes of Health (NIH) (GM08715/NIH Biotechnology Training Grant to EPG and MAO), the National Science Foundation (CAREER Award to JP), and the Whitaker Foundation for support.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Authors' contributions

EG and MO conceived of the study and participated in its design and implementation. JP conceived of the study and participated in its design and coordination. AB and CM helped with key analyses of the data. All authors read and approved of the final manuscript.

Erwin P Gianchandani, Matthew A Oberhardt contributed equally to this work.

## Electronic supplementary material

### 12859_2007_2028_MOESM1_ESM.pdf

*Saccharomyces cerevisiae*metabolism. A description of the

*S. cerevisiae*central metabolic network from [22], including reaction and metabolite listing, is provided. (PDF 311 KB)

**BOSS**

Additional File 2: Clustering in . The clustering component of the **BOSS** framework, as implemented in MATLAB, is described. (PDF 730 KB)

**BOSS**

Additional File 3: Evaluating data integration in . **BOSS** is applied to a prototypic system designed to illustrate the ability of the framework to infer an objective function by synthesizing information from multiple experimental conditions. (PDF 396 KB)

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

## About this article

### Cite this article

Gianchandani, E.P., Oberhardt, M.A., Burgard, A.P. *et al.* Predicting biological system objectives de novo from internal state measurements.
*BMC Bioinformatics* **9, **43 (2008). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-9-43

Received:

Accepted:

Published:

DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-9-43

### Keywords

- Objective Function
- Metabolic Network
- Flux Distribution
- Flux Balance Analysis
- Flux Data