Pathway data model. A biological pathway as represented by PathVisio has three main classes of objects: DataNodes, Lines and Shapes. The most important are the DataNodes, represented by boxes. These data nodes can represent genes, proteins (A), or metabolites (B). DataNodes can be linked to an online database; in this example, MDH2 is linked to Entrez gene accession no 4191, and Malate is linked to HMDB identifier HDMB03256. DataNodes can be grouped to represent certain biological relationships. In this example, IDH3A, IDH3B and IDH3G are grouped to indicate that they form three subunits of a protein complex. A second class of objects is formed by lines, t-bars, and arrows that represent interactions between data nodes (C). Various shapes and text labels can be used to explain the pathway. In this example, shapes are used to distinguish the cytosol from the mitochondrion. (D). Pathways are stored in the GPML file format [see additional file 1] The PathVisio source code includes an XML Schema definition that can be used for checking the validity of GPML files.