Skip to main content

Archived Comments for: The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

Back to article

  1. Metabolomics Repository needed

    Oliver Fiehn, UC Davis

    26 July 2010

    This is an interesting article pointing to a lack of standardized repositories for metabolome data. I concur with the authors that future databases have to extend largely in order to enable cross-study comparisons. However, I think it would be better to construct platform independent databases to accommodate for the many technical platforms used today, in the same manner as databases have been set up for gene or protein expression studies. In addition, I do not concur with the authors' view that only biology-focused, peer-reviewed publications should be published. Instead, any data set that has valid descriptions of the biological study designs would be of interest for comparison purposes, e.g. for mutant KO studies deposited in plantmetabolomics.org or SetupX/BinBase. Several of these studies have indeed been published in peer-reviewed journals, and data sets are publicly available. I would welcome open workshops in data processing and assignment of data sets, and I would be grateful if readers and authors of this paper could join efforts of the Metabolomics Society to organize such workshops!

    Competing interests

    none

  2. Response to comment by Oliver Fiehn

    Adam James Carroll, The Australian National University and the ARC CoE in Plant Energy Biology

    4 August 2010

    Dear Oliver,

    I’m not sure where you got the impression we disagree on these matters, but if you read the published version of the manuscript carefully you will find that we actually agree on all points you have raised. Our views on dissemination of unpublished data were described very clearly in the text and we also clearly indicated that we fully intend to introduce support for technology platforms other than GC/MS. The rationale behind our initial focus on GC/MS will be outlined below.

    Firstly, regarding transparent dissemination of unpublished datasets:

    Although we originally planned to only include published datasets (as a type of quality control to filter out fundamentally flawed experiments), we ultimately changed our mind, deciding that the advantages of allowing unpublished datasets to be included (as long as they were adequately and systematically described) far outweighed the disadvantages of accepting unpublished datasets. After all, it is facile to introduce an optional filter for publication status to database queries. Many very useful datasets are out there that, for one reason or another, will never be published in peer-reviewed articles and to exclude these from transparent dissemination and mining would be senseless.

    If you read the section of our paper under the heading “Quality control of datasets submitted to the MetabolomeExpress database of metabolite response statistics”, you will find our policy statement is essentially what you have suggested would be the ideal: ie. no requirement for publication as long as the dataset is properly described. However, it seems our views might diverge slightly when it comes to appropriate definitions of ‘valid descriptions of the biological study designs’. We require all metadata associated with a particular experiment to be provided in a single text file with a systematic, extensible, standardised format with proper use of predefined, recognised ontologies etc. We developed this metadata exchange format (and its field requirements) largely on the basis of your own paper, “Quality control for plant metabolomics: reporting MSI-compliant studies” (Fiehn et al. (2008) The Plant Journal Volume 53 Issue 4, Pages 691 – 704). Hence, we were somewhat surprised to find that the none of the metadata (including the MSI XML schema metadata file) associated with the example FatB induction dataset associated with the above article (and indeed other datasets) in SetupX contained information related to Chemical Analysis, Data Processing, Statistics or Quality Control. I tried to find the extraction SOP used in this study at SetupX but it was not available as a ‘public’ SOP. Moreover, the use of the ontologies / controlled vocabularies recommended by the MSI in one of your other papers (Fiehn et al. (2007) Minimum reporting standards for plant biology context Metabolomics 3:195-201) seems to be inconsistent in Setup X. For example, in the FatB MSI XML metadata, Organ Name is correctly specified using the Plant Ontology (PO) term ‘rosette leaf’. However, this file does not link samples to raw data files. To determine these links, one must use the SetupX experimental design file in which Organ Name is specified as ‘aerial portion, rosette leaves’ which is not strictly a PO term in that it would fail to match a PO term by direct string search. This complicates re-use of the datasets disseminated via SetupX. The situation is even more challenging at plantmetabolomics.org where no systematic metadata schemas seem to be currently supported.

    In contrast, at MetabolomeExpress, essentially all the metadata recommended by the MSI in Fiehn et al. (2007) and Fiehn et al. (2008) (and all the other MSI reporting standards papers in Metabolomics for areas such as in vivo, in vitro, and environmental metabolomics) are available in a single, downloadable, human- and computer-readable (object-oriented) file that allows the origin of any raw data file to be traced right back through the experimental workflow (ie. Admin -> BioSource -> BioSample -> Extract -> AnalyticalSample -> AnalyticalRun) with all the methodological information required to precisely reproduce each step of the experiment being linked to each object produced during the experiment. In my view, this is how datasets should be disseminated: as self-contained, internally-consistent objects that can be plugged straight into other standardised database systems with the simple copy-and-paste of a file system folder. People should not have to dig through journal articles to find the information required to reproduce an experiment. By the way, I am sorry we could not point to the peer-reviewed, published datasets in plantmetabolomics.org or SetupX in our counts in Table 2 of our paper. We could not find this information anywhere in either website. Given the importance of these links, I would kindly suggest allowing datasets in those databases to be searched or browsed by publication (as you can using the ‘Database Statistics’ module of Database Explorer at MetabolomeExpress).

    See the quality control section from the manuscript, below:

    "Quality control of datasets submitted to the MetabolomeExpress database of metabolite response statistics

    Any user with a complete dataset stored in a MetabolomeExpress FTP repository may submit this dataset to be imported into the main statistical database. The quality control model used by MetabolomeExpress follows essentially the same principles as the major microarray data repositories. There is no requirement for data to be processed with the MetabolomeExpress data processing pipeline, as long as all the required data is provided in the correct formats. The MetabolomeExpress team does not make any subjective assessment of the quality of data or the scientific merit of a submitted experiment, nor does the data need to be published in a peer-reviewed journal. Rather, quality control is totally objective (carried out automatically by a computer script) and serves only to ensure that the dataset provided is complete (ie. it includes: a correctly completed metadata file, all raw data files, peak lists, a library match report, a normalised data matrix and a statistical results file – all formatted correctly). The validation script uses human-readable ‘validation template’ files defining reporting requirements and controlled vocabularies for major metabolomics research areas (eg. plant, animal, bacterial, fungal and environmental) and model systems with highly-developed bioinformatics resources (eg. Arabidopsis thaliana, rice, human, mouse, Escherichia coli and Saccharomyces cerevisiae). These templates have been designed to facilitate reporting according to recommendations of the Metabolomics Standards Initiative (MSI; http://msiworkgroups.sourceforge.net/). Currently implemented templates are available from the MetabolomeExpress website while instructions for their interpretation are provided in the MetabolomeExpress User’s Guide (additional file 1). If a submitted dataset is cleared by the validation script, a final security check is made by the MetabolomeExpress curator before results are imported into the database."

    Regarding development of a technology-platform-independent database:

    Our views could have been described more clearly in this article but we did mention in the "Future developments" section that we are working on "enhanced capability for processing LC/MS and CE/MS metabolomic and quantitative proteomic data (including accurate-mass signals)". This should have read "LC/MS, CE/MS and NMR" which, in addition to GC/MS, will cover the vast majority of available metabolite profiling datasets. We put a *very* high value on transparency at the level of raw data and processing and therefore chose to focus initially on getting this right for GC/MS (the MAJOR source of metabolite profiling data pertaining to central metabolism) before tackling other platforms for which less true metabolite profiling data (ie. data with unambiguous, reproducible metabolite identifications and relative quantifications) is available.

    Given the huge amount of GC/MS metabolomics data that is currently going to waste due to inadequate annotation and dissemination or of uncertain reliability due to a general lack of transparency in GC/MS processing, we decided it was crucial to focus on addressing these issues now by building a raw data-enabled, transparent GC/MS database than wait until after we had built a more opaque, technology-independent database (which already existed anyway).

    That said, we acknowledge that raw data associated with many metabolite response data will often be lost or otherwise inaccessible (particularly for older datasets that may have used early techniques such as paper and thin-layer chromatography) and we need a system to archive and mine these results regardless of the analytical technique used or level of opaqueness. We are currently in the process of building (into MetabolomeExpress) a platform-independent database of metabolite response data from the literature (but will welcome submissions of unpublished datasets). This database will not only be platform-independent, it will be organism-independent as well, allowing comparisons of responses to similar perturbations in different systems. Raw data incorporation will be optional and dependent on supply of raw data (and detailed processing information) from authors (who will be invited to do so). Datasets will be annotated with regard to their level of transparency and with various objective quality metrics which will allow optional filtering of database queries.

    Regarding cooperation and involvement in the efforts MSI / Metabolomics Society, we mention in “Future developments” that we plan “support for data exchange formats developed or endorsed by the Metabolomics Standards Initiative and similar initiatives”. For the reasons described above, we were not satisfied with current MSI XML format used at SetupX and needed to develop our own. We felt that developing these formats via public workshop discussions would unnecessarily slow down development. We therefore decided to give it a go ourselves first and join the public discussion later by asking for feedback. If the MSI would like to officially endorse our existing exchange format and validation schemas, this would be fantastic (although we may still make a few improvements ourselves before settling on a single format – mainly to enable platform independence and use of parallel technologies in a single experiment object). Otherwise, we are open to negotiations with or recommendations from the MSI to become officially endorsed.

    Regards,

    Adam Carroll

    Competing interests

    None

Advertisement