MAVEN: compound mechanism of action analysis and visualisation using transcriptomics and compound structure data in R/Shiny
BMC Bioinformatics volume 24, Article number: 344 (2023)
Understanding the Mechanism of Action (MoA) of a compound is an often challenging but equally crucial aspect of drug discovery that can help improve both its efficacy and safety. Computational methods to aid MoA elucidation usually either aim to predict direct drug targets, or attempt to understand modulated downstream pathways or signalling proteins. Such methods usually require extensive coding experience and results are often optimised for further computational processing, making them difficult for wet-lab scientists to perform, interpret and draw hypotheses from.
To address this issue, we in this work present MAVEN (Mechanism of Action Visualisation and Enrichment), an R/Shiny app which allows for GUI-based prediction of drug targets based on chemical structure, combined with causal reasoning based on causal protein–protein interactions and transcriptomic perturbation signatures. The app computes a systems-level view of the mechanism of action of the input compound. This is visualised as a sub-network linking predicted or known targets to modulated transcription factors via inferred signalling proteins. The tool includes a selection of MSigDB gene set collections to perform pathway enrichment on the resulting network, and also allows for custom gene sets to be uploaded by the researcher. MAVEN is hence a user-friendly, flexible tool for researchers without extensive bioinformatics or cheminformatics knowledge to generate interpretable hypotheses of compound Mechanism of Action.
MAVEN is available as a fully open-source tool at https://github.com/laylagerami/MAVEN with options to install in a Docker or Singularity container. Full documentation, including a tutorial on example data, is available at https://laylagerami.github.io/MAVEN.
The discovery of the Mechanism of Action (MoA) of a small molecule, which describes the biochemical interactions a molecule makes to produce a pharmacological effect, is an important aspect of drug discovery for a wide range of reasons, from repurposing for a new indication to anticipating potential side effects and rationalising phenotypic findings . Advances in machine learning techniques, combined with large publicly availably bioactivity databases such as ChEMBL and PubChem, as well high-throughput biological assays such as LINCS L1000 and DRUG-Seq, have contributed to the development of computational methods for generating hypotheses of compound MoA . Two popular approaches include target-based and network-based methods. Target-based methods aim to predict the direct biological target of the compound, and have shown high performance using chemical structure fingerprints as descriptors [3,4,5]. Network-based methods such as causal reasoning use transcriptomics data along with prior knowledge networks to infer upstream drivers of transcriptional changes, and have been shown to capture biological pathways modulated by drug compounds [6,7,8,9].
However, such approaches often require proficiency in programming languages such as R and Python as well as the command-line, and output computer-readable data which can be difficult to convey to non-specialists, which can hinder scientific communication in multi-disciplinary groups. R/Shiny apps allow for the implementation of R code and the visualisation of results in an interactive GUI, and have been widely used, e.g., also for the integration of multi-omics (e.g., transcriptomics, phosphoproteomics, metabolomics) data with bioinformatics tools such as COSMOS  and CARNIVAL  to gain insights into compounds or other perturbations . Hence, here we introduce MAVEN, or Mechanism of Action Visualisation and ENrichment, an R/Shiny app which allows users to integrate compound structure-based target prediction with gene expression-based causal reasoning without prior coding experience, and allows for the visualisation and pathway enrichment of the results to obtain a systems-level, easily interpretable view of the mechanism of action of a compound.
Development and installation
MAVEN was written in the R programming language (v 4.2) using the Shiny application framework, and the source code is available for local installation at https://github.com/laylagerami/MAVEN. To run direct target prediction (which is optional for software functionality) the app also invokes PIDGINv4  (https://github.com/BenderGroup/PIDGINv4) models and scripts implemented in Python, using a Bash command script called from within R. For causal reasoning over biological prior knowledge networks with CARNIVAL  (https://github.com/saezlab/CARNIVAL), it is necessary to install an ILP (Integer Linear Programming) solver, either the free, open-source Cbc solver  or the free-for-academic IBM ILOG CPLEX . Installation and configuration instructions for the solvers are described in the documentation https://laylagerami.github.io/MAVEN/ along with troubleshooting steps. We also provide an R script to install all packages with the required versions, and a conda.yml file with packages required for running the PIDGINv4 Python scripts. In case a container solution is preferred for ease of installation and security purposes, build files for Docker and Singularity containers with all required software and environments (including solvers) are provided. The size of the PIDGINv4 models prevent the publication of the app on the Shiny web server, but the same tools (minus compound structure-based target prediction) are available via the FUNKI Shiny web-app (https://saezlab.shinyapps.io/funki/) . Installation and deployment with the open-source Cbc solver have been tested on the HPC systems at Eli Lilly and Company and AWS in order to ensure compatibility with corporate computational environments.
The Omnipath  signed and directed protein–protein interaction network is included with the app as well as gene expression  and compound structure data for lapatinib which is used as an example in the documentation, and will be discussed here in the case study. For pathway enrichment on the predicted signalling network, MSigDB [16,17,18] (v2022.1) gene sets in the hallmark (H), curated (C2) and ontology (C5) collections have been provided (as well as an option to use custom user-uploaded gene set files).
Workflow and use
The overall workflow for MAVEN is depicted in Fig. 1. Three inputs are taken; known or hypothesised targets which can be predicted from the compound’s chemical structure with PIDGINv4  or defined a priori (optional) (Fig. 1A); a signed and directed (i.e., A activates/inhibits B) prior knowledge network (Fig. 1B) for causal reasoning; and compound-induced gene expression data in the form of a summary statistic such as t-values or log2-fold changes (Fig. 1C). A signed and directed prior knowledge network on causal protein–protein interactions is required to infer causality and function (activation or inhibition), and can be obtained from open source databases e.g., Omnipath  (provided), SignaLink  or SIGNOR . Gene expression data in the form of differential expression signatures (i.e., Z-score, Log2FC, t-statistic) can be from any platform, e.g., microarray, RNA-Seq, and publicly available gene expression data is available for many perturbations in databases such as GEO (https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo/)  (provided for the compound lapatinib) and LINCS L1000 (https://clue.io/releases/data-dashboard—Level 5) . The differential expression signature is then used to infer transcription factor (TF) activities with DoRothEA  and pathway activities with PROGENy , which is then used along with the prior knowledge network by CARNIVAL  to optimise a subnetwork which captures signalling proteins upstream of TF activity changes and, if targets are predicted or provided, links them to the targets (Fig. 1D). The outputs from DoRothEA, PROGENy and CARNIVAL are processed and formatted using helper scripts from (https://github.com/saezlab/transcriptutorial). Finally, the subnetwork can be viewed and exported to use in other software such as Cytoscape , and we also provide a collection of MSigDB  gene sets (or allow for the upload of a custom gene set) for pathway enrichment with over-representation analysis (ORA), the results of which can also be visualised on the network (Fig. 1E).
MAVEN is designed to be scalable and flexible to the needs of the user by taking advantage of parallel processing available in PIDGINv4 and CARNIVAL for the two bottleneck steps (target prediction and network optimisation), and depending on the available resources (i.e., RAM, number of processors) can handle large networks and gene expression signatures. This is because gene expression data is reduced to a smaller (user-defined) number of transcription factor activities using DoRothEA, making the network optimisation more efficient by reducing the input space from tens of thousands of data-points to typically 50–100. Furthermore, a time limit can be applied to the CARNIVAL optimisation step as a setting, to stop the process if an optimal solution isn’t found. For large networks it is recommended to use the IBM ILOG CPLEX, as prior benchmarking has found that the solver outperforms Cbc in such cases . As MAVEN is a graphical user interface (GUI), there is a small amount of computational overhead required over running the analyses purely programmatically, however in practice this does not present as a decrease in performance for the user.
Throughout the workflow, all chosen parameters and command-line options are saved in log files for reproducibility purposes and so that analyses can be re-run programmatically (for information on how to run the tools used in MAVEN, please refer to https://pidginv4.readthedocs.io/en/latest, https://github.com/saezlab/shinyfunki and https://github.com/saezlab/transcriptutorial). Additionally, there are help buttons throughout the GUI with more information to aid the user in choosing algorithm parameters, and guidelines on the formatting of data. The functionalities implemented in MAVEN will now be discussed in more detail:
Target prediction with PIDGINv4
The first data analysis step in MAVEN is target prediction based on compound chemical structure—though this is optional and targets can be manually entered or left out entirely. These targets are used as input to CARNIVAL in a later step, to connect to inferred signalling proteins. Target prediction is implemented in MAVEN by invoking the PIDGINv4 software. PIDGINv4  is an open-source target prediction tool trained on ChEMBL  (v29) and PubChem  data using the scikit-learn  Python package, available on GitHub (v4.2). The tool consists of a collection of Random Forest models trained on the chemical structures (ECFP4 fingerprints calculated with the RDKit  Python package) of active and inactive compounds against 2000 + human targets, and Python scripts to generate predictions for query compounds and to search for structurally similar compounds in the model training sets. For target prediction, the user is required to upload a.smi file, and a ChemDoodle  widget  is embedded in the app GUI to sketch the structure and generate a SMILES file in case the structural SMILES are not known. The user can select various parameters for the target prediction including activity threshold (0.1, 1, 10 or 100 µM – default 10 µM), number of cores (default 10), and applicability domain filter (default 50 out of 100) to remove low-confidence predictions . Once the user chooses to run the target prediction, a Bash script is invoked which runs the predict.py and sim_to_train.py PIDGINv4 scripts. The predict.py script processes the input SMILES and calculates ECFP4 fingerprints, applies the pre-trained models, and then outputs the Platt-scaled Random Forest probability values. The sim_to_train.py script retrieves the most structurally similar compound in the ChEMBL29 database (nearest neighbour), based on Tanimoto similarity of their ECFP4 fingerprints. The results from both scripts are saved on disk, and then formatted and displayed in the GUI.
Transcription factor enrichment with DoRothEA and VIPER
To perform causal reasoning on a protein–protein interaction network, the gene expression data must be converted from the “gene-level” to the “protein-level” by inferring upstream TFs driving the expression changes. DoRothEA (Bioconductor dorothea v1.8.0) describes curated TF regulons, so known TF-gene interactions . Each interaction is given a confidence score reflecting the supporting evidence behind it from A (highest confidence, manually curated) to E (lowest confidence, computational predictions). The package is coupled with the VIPER (v1.3)  statistical method to infer TF activity from gene expression data, generating normalised enrichment scores for each TF . In the app, the user can select the confidence levels A-E to filter the interactions in the regulon, and a slider for the number of top TFs they want to report and use for causal reasoning analysis. By default, 50 TFs are reported and plotted as a bar-chart in terms of their normalised enrichment score (NES, from -1 to 1). This number is generally a trade-off between coverage and noise which can be examined by adjusting the slider and viewing the NES plot, which updates automatically upon re-calculation. Furthermore, only confidence levels A-C are included by default, but this criterion can be relaxed if more enriched TFs are required. The documentation and help buttons also provide guidance on choosing these parameters. Another parameter which can be changed in the source code (but not the GUI) is the ‘minsize’ VIPER parameter which indicates the minimum number of genes per TF regulon, set to 5 by default.
Pathway activity inference with PROGENy
Pre-weighting proteins on the prior knowledge network has shown to improve the causal reasoning results by CARNIVAL . PROGENy  (Bioconductor progeny v1.16.0) is a “footprint” method which infers pathway activities by leveraging a large compendium of publicly available perturbation experiments that yield a common core of Pathway RespOnsive GENes. Based on the pathway footprint genes, PROGENy produces pathway scores for 14 major signalling pathways (Androgen, EGFR, Estrogen, Hypoxia, JAK-STAT, MAPK, NFkB, p53, PI3K, TGFb, TNFa, Trail, VEGF and WNT). The pathway scores are converted under-the-hood to protein weights to improve the CARNIVAL optimisation. The user can select the number of most dysregulated genes to include in the PROGENy pathway score calculations (by default 100, but this depends on the number of input genes – for experiments with a higher coverage such as RNA-Seq, this can be increased to e.g., 200 – 500).
Causal reasoning with CARNIVAL
The last analysis tool implemented in MAVEN is causal reasoning to infer dysregulated signalling networks. CARNIVAL  (Bioconductor CARNIVAL v.2.6.2) is a causal reasoning algorithm based on integer linear programming (ILP) which aims to optimise a subnetwork of signalling proteins contextualising a perturbation of interest. CARNIVAL takes as input dysregulated transcription factors (from DoRothEA) and a prior knowledge network (signed and directed protein–protein interactions), with pre-computed node weights (based on pathway activity scores from PROGENy) to aid the network optimisation. CARNIVAL generates multiple solutions which are then aggregated to form a consensus network which connects TFs to targets (pre-defined or predicted with PIDGINv4) via inferred dysregulated signalling proteins, including their sign (activated or inhibited). If no target is defined then the signalling proteins get connected to a proxy “perturbation” node. The user can choose the runtime (in seconds) and the number of cores. To solve the ILP problem a separate solver must be installed—Cbc  (v.2.9, free and open source) or IBM ILOG CPLEX  optimisation studio (v20.10, free for academic use or a license is required). The lpSolve (v5.7.16) ILP optimiser  implemented in R is also available to use and is installed along with CARNIVAL, but it is strongly recommended to be used only for toy examples or testing purposes.
Visualisation and functional enrichment
Following causal reasoning with CARNIVAL the consensus network is visualised in the GUI using the visNetwork package (CRAN visNetwork v.2.1.0). To put the inferred signalling network into biological context it is possible to perform functional enrichment. To this end, 11 MSigDB  gene sets collections are included with MAVEN (such as Hallmark , GO , Reactome , Wikipathways ). Alternatively, a.gmt file can be uploaded by the user for custom enrichment analysis. Over-representation analysis of the signalling network nodes in the gene sets using the prior knowledge network as background is performed with piano  runGSAhyper function (Bioconductor piano v2.10.1). Following enrichment and tabular display of the results, the user can select a pathway of interest and highlight the participating proteins on the network. The pathway results can also be downloaded. The network.sif file is also saved for further analysis and visualisation in Cytoscape  or other software packages.
Demonstration and discussion
To demonstrate the app’s utility for generating hypotheses for compound mechanism of action in practice, and to give an overview of the UI and app functionalities, we will now present a case study using the EGFR and ERBB2 (HER2) inhibitor lapatinib (we also provide this as a tutorial included in the documentation).
The differential gene expression data used in this case study is derived from lapatinib-treated (1uM, 6 h) HER2-positive BT474 breast cancer cells, from a publication by Sun et al  (GEO  accession GSE129254). In HER2-positive breast cancer, lapatinib inhibits the activation of signalling pathways downstream of EGFR and HER2 including MAPK, PI3K-AKT and PLC-γ, leading to apoptosis, decreased cellular proliferation and cell cycle arrest . The aim of the MAVEN analysis in this case study is to infer a signalling network which captures the known cellular response of HER2 + positive cells treated with lapatinib.
The MAVEN UI is split into five tabs; Index (landing page), Data, Targets, Analysis and Visualisation (Fig. 2). The landing page provides a summary of the MAVEN workflow and the case study will proceed from the second tab (Data).
Here the gene expression data and prior knowledge are uploaded and stored in local memory for use in the Analysis tab. The user can browse for their files or use the toggle to load the Omnipath network and the lapatinib gene expression data used in this case study (Fig. 2). As well as the documentation, there are help buttons throughout the workflow to explain file formats, definitions of parameters, and so on.
After checking that the data is in the correct format (including checking valid HGNC symbols and reporting any invalid symbols using HGNChelper v0.8.1 ), the GUI provides a summary of the uploaded data for the user to check e.g., number of nodes and edges in the network. The user is then prompted to move onto the Targets tab.
The Targets page is split into four sub-tabs (Fig. 3) and is an optional step in the MAVEN workflow. In the first tab (Fig. 3A), the user either uploads a SMILES file or sketches their compound to produce a SMILES file. Following successful SMILES upload, the compound is displayed as an image for the user to check, which can be seen for the case study with the correctly rendered lapatinib structure. In the second tab, the user is able to select the options for running PIDGIN (Fig. 3B). Here, the bioactivity threshold was set to 1 µM to correspond with the concentration of lapatinib used to generate the gene expression data. The applicability domain (AD) filter was set to 30, and 20 cores of compute power were used to run the predictions. After choosing the parameters the user is prompted to browse for the location of their PIDGINv4 installation directory, and then a button becomes available to click for running the target prediction analysis (which can be monitored via the R console output). Targets can also be defined manually by entering their HGNC symbols in the (D) User-defined targets tab.
Once the PIDGIN run is complete, the results are saved and also displayed in the third tab, (C) Results, as a data table (Fig. 4), with one row for each target model. The table contains the HGNC symbol, target name, predicted probability of activity, ChEMBL ID of the most structurally similar compound in the ChEMBL29 database (nearest neighbour), Tanimoto similarity of the nearest neighbour compared to the query compound computed from ECFP4 fingerprints, and the experimental measurements available for this compound. The target and nearest neighbour are hyperlinked to the UniProt and ChEMBL databases, respectively. It can be seen that many of the highest-predicted targets (ERBB4, EGFR, ERBB2, KCNH2 and PIK3C2B) are experimentally measured targets of lapatinib (CHEMBL554, Tanimoto Similarity = 1).
Targets can be chosen from the PIDGIN output (by selecting rows) based on the predicted probabilities as well as Tanimoto similarities (the higher the better in both cases; a predicted probability of 0.5 or above indicates that the compound is active against the target, and a Tanimoto similarity of 0.3 or above is considered “similar” in the feature space used to build the models ), or by consulting the literature references to a wide variety of protein functions listed in their linked UniProt entries (e.g., https://www.uniprot.org/uniprot/P00533 for EGFR). Alternatively, the analysis can be run without targets, and then re-run with selected targets based on these findings to investigate specific target hypotheses. For example, if the final network outputs nodes from a particular signalling pathway, a highly-predicted target upstream of this pathway can be used to refine the final network. The information provided in Fig. 4 is intended only for selecting targets of interest, only the target HGNC symbols themselves are used as information for the CARNIVAL optimisation.
We took the three targets with highest predicted probability; ERBB4 (0.790), EGFR (0.784) and ERBB2/HER2 (0.724) all known to be expressed in HER2 + breast cancer [44, 45], to the causal reasoning analysis stage. The rows are selected in the data table as shown in Fig. 4.
The analysis page is split into three sub-tabs for the three bioinformatics analysis methodologies; DoRothEA, PROGENy and CARNIVAL. The settings for each can be set on the left-hand side of the page (Fig. 5).
For the case study, DoRothEA (Fig. 5A) was run with confidence levels A, B and C and the top 50 enriched TFs have been used for further analysis. For PROGENy (Fig. 5B) the top 100 most responsive lapatinib genes (based on the t-values input in the Data stage) were used for the calculations. For CARNIVAL (Fig. 5C) it can be seen that the targets selected in the previous step (Fig. 4) have populated the CARNIVAL options, and they can be further deselected if required—here, we kept EGFR, ERBB4 and ERBB2 as described above. We set a time limit of 3600 s for the calculations, 30 cores of compute power and used the IBM ILOG CPLEX solver for solving the ILP problem. This means that the solver will generate as many optimal network solutions as possible with the given time and compute resources, and output the final consensus network. Increasing the time limit or number of cores hence allows the solver to generate more networks, which may be required if no optimal solutions are found.
Following DoRothEA analysis, the resulting normalised enrichment scores (NES) for each TF are displayed as a bar chart (Fig. 6) and a corresponding data table with TFs hyperlinked to their corresponding UniProt page. It can be seen from the plot that the top enriched upregulated TF was FOXO3 which is known to be upregulated by lapatinib in HER2 + cells , and the top enriched downregulated TF was ESRRA which is known to be degraded in response to lapatinib-mediated inhibition of growth factor-induced signalling in HER2 + tumours . Hence, MAVEN is able to generate an easy-to-interpret overview of TFs which are known to be dysregulated by lapatinib in the specific cellular context under investigation.f the slider is adjusted to select a different number of top-scoring TFs, the plot and table of results automatically update. The number chosen here is a trade-off between coverage (where selecting a higher number may lead to additional findings) and also noise, where on the other hand a greater number of TFs may not necessarily contribute additional information and instead increase computational time. To aid in this decision, the plot and associated UniProt information for each TF can be consulted to select a number that provides good coverage of different protein functions (i.e., to not solely choose a set of proteins in the same family, so that the CARNIVAL analysis can better exploit the prior knowledge network) coupled with prior knowledge/hypotheses on phenotypic findings. The interface help buttons (which can be seen in Fig. 5) also provide guidance text for selecting these parameters, from the authors of DoRothEA.
Following PROGENy analysis, the results are visualised in the same way—a bar chart of predicted pathway activity score (from -1 to 1 indicating inhibition and activation) (Fig. 7) and a corresponding data table (not shown). In agreement with the results of the analysis, lapatinib is known to inhibit the EGFR , MAPK  and PI3K  pathways in HER2 + cells. The pathway scores are converted to weights on the protein–protein interaction network, which aids the optimisation of the signalling subnetwork by CARNIVAL . By default, the top 100 top responsive genes are chosen, but this can be adjusted depending on the coverage of the gene expression experiment – in general, the greater the number of genes measured, the greater the number of top responsive genes (e.g., 200–500 for RNA-Seq experiments). The bar chart will again update upon adjustment of the number of genes, and can be interpreted with regards to the function of each pathway and what would be expected based on what is known about the compound.
Visualisation and enrichment analysis
Following DoRothEA and PROGENy analysis, CARNIVAL can be run, taking as input the DoRothEA enriched TFs and pathway weights from PROGENy, as well as the prior knowledge network uploaded in the first Data step. Once complete, the resulting CARNIVAL consensus network is visualised on the visualisation page (Fig. 8). Files from previous analysis runs, which are automatically saved, can also be uploaded into the tool for visualisation.
It can be seen that the top layer of the network consists of the three selected targets (ERBB4, ERBB2 and EGFR), the bottom layer consists of the input TFs (e.g., FOXO1, FOXO3, ESR1), and they are connected by signalling proteins with inferred directionality (indicated with blue for up-regulation and red for down-regulation), which along with their interactions have been optimised from the input prior knowledge network. As well as visualising the resulting network and deriving hypotheses from individual nodes, it is possible to perform pathway enrichment using the network proteins in an over-representation analysis. To illustrate this, we ran the enrichment analysis using the BioCarta  gene set and the top enriched pathway, HER2 signalling pathway (adjusted p-value = 2.26e−9) is visualised on the network with participating proteins highlighted in green (Fig. 8). Hence, CARNIVAL was able to construct a signalling network highly enriched for HER2 signalling, including the signalling proteins MAPKs, ESR1, ERBB2, EGFR, PIK3CA  and EP300 , which are known to be relevant for the primary mechanism of action of lapatinib in HER2 + cancers .
The enrichment results are also displayed in the GUI as a data table (Fig. 9) and if one of the included MSigDB sets was used for the analysis, then they can be clicked through to the entry on the MSigDB website. A.csv file with more information on the enrichment results (e.g., participating proteins, odd’s ratio, unadjusted p-value) can also be downloaded.
Case study summary
Through the case study, we have demonstrated the ability of the MAVEN R/Shiny app and its constituent tools to produce and report correct target prediction results (predicting the lapatinib targets EGFR and ERBB2), infer both down- and up-regulated transcription factors induced by lapatinib (including FOXO3 and ESRRA), infer pathways known to be modulated by lapatinib (EGFR, MAPK and PI3K), and finally construct and visualise a signalling network which is highly enriched for the HER2 signalling pathway known to be modulated by lapatinib. This demonstrates the detailed insights into compound MoA that can be obtained using MAVEN’s user-friendly interface, and without requiring extensive coding knowledge.
Future additions to the app will include a batch upload option to analyse multiple compounds at once and compare their results, options to use other causal reasoning algorithms, and the ability to upload and analyse other data types (e.g., phosphoproteomics and metabolomics data). Supplementary files such as the MSigDB gene sets will be continuously updated to reflect any major version changes. Suggestions for new features can also be requested on the GitHub page https://github.com/laylagerami/MAVEN.
We have developed an R/Shiny app called MAVEN (Mechanism of Action Visualisation and ENrichment) a novel, feature-rich tool integrating chemical-structure-based target prediction with gene expression-based causal reasoning analysis, coupled with visualisation and pathway enrichment analysis. A case study, using the chemical structure of lapatinib and gene expression data derived from lapatinib-treated HER2 + positive cells, has demonstrated the ease of inferring detailed insights into compound MoA using MAVEN.
Availability and requirements
Project name: MAVEN. Project home page: https://github.com/laylagerami/MAVEN. Operating system(s): Platform independent. Programming language: R, Python. Other requirements: R v.4.1 or higher, Python v.2 or higher. License: GNU General Public License. Any restrictions to use by non-academics: A license is required to use the IBM solver which is only freely available for academics.
Availability of data and materials
The dataset used in the case study is available from GEO (accession number GSE129254). The IBM Cplex optimiser is optional to use, and requires a license which can be obtained at https://www.ibm.com/products/ilog-cplex-optimization-studio, and is not included with the application or containers. The CBC solver is free to use without a license.
Mechanism of action
Normalised enrichment score
Trapotsi M-A, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data. Methods Int RSC Chem Biol. 2021. https://0-doi-org.brum.beds.ac.uk/10.1039/D1CB00069A.
Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC, Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF, Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R, Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray NS, Clemons PA, Silver S, Wu X, Zhao W-N, Read-Button W, Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL, Doench JG, Bittker JA, Root DE, Wong B, Golub TR. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437-1452.e17. https://0-doi-org.brum.beds.ac.uk/10.1016/j.cell.2017.10.049.
Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert D-A, Hochreiter S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci. 2018;9(24):5441–51. https://0-doi-org.brum.beds.ac.uk/10.1039/C8SC00148K.
Mervin LH, Afzal AM, Drakakis G, Lewis R, Engkvist O, Bender A. Target prediction utilising negative bioactivity data covering large chemical space. J Cheminform. 2015;7(1):51. https://0-doi-org.brum.beds.ac.uk/10.1186/s13321-015-0098-y.
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J. 2021;19:4538–58. https://0-doi-org.brum.beds.ac.uk/10.1016/j.csbj.2021.08.011.
Hosseini-Gerami L, Higgins IA, Collier DA, Laing E, Evans D, Broughton H, Bender A. Benchmarking causal reasoning algorithms for gene expression-based compound mechanism of action analysis. BMC Bioinform. 2023;24(1):1–28. https://0-doi-org.brum.beds.ac.uk/10.21203/rs.3.rs-1239049/v1.
Liu A, Trairatphisan P, Gjerga E, Didangelos A, Barratt J, Saez-Rodriguez J. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. Npj Syst Biol Appl. 2019;5(1):1–10. https://0-doi-org.brum.beds.ac.uk/10.1038/s41540-019-0118-z.
Enayetallah AE, Ziemek D, Leininger MT, Randhawa R, Yang J, Manion TB, Mather DE, Zavadoski WJ, Kuhn M, Treadway JL, Etages SA. Modeling the mechanism of action of a dgat1 inhibitor using a causal reasoning platform. PLoS ONE. 2011;6(11):e27009. https://0-doi-org.brum.beds.ac.uk/10.1371/journal.pone.0027009.
Kumar R, Blakemore SJ, Ellis CE, Petricoin EF, Pratt D, Macoritto M, Matthews AL, Loureiro JJ, Elliston K. Causal reasoning identifies mechanisms of sensitivity for a novel AKT kinase inhibitor, GSK690693. BMC Genomics. 2010;11:419. https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2164-11-419.
Dugourd A, Kuppe C, Sciacovelli M, Gjerga E, Gabor A, Emdal KB, Vieira V, Bekker-Jensen DB, Kranz J, Bindels EM, Costa AS. Causal integration of multi-omics data with prior knowledge to generate mechanistic hypotheses. Mol Syst Biol. 2021;17(1):e9730. https://0-doi-org.brum.beds.ac.uk/10.15252/msb.20209730.
Hernansaiz-Ballesteros R, Holland CH, Dugourd A, Saez-Rodriguez J. FUNKI: interactive functional footprint-based analysis of omics data. Bioinformatics. 2022;38(7):2075–6. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btac055.
Forrest J, Ralphs T, Vigerske S, LouHafer, Kristjansson B, jpfasano, EdwinStraver, Lubin M, Santos H G, rlougee, Saltzman M. Coin-or/Cbc: Version 2.9.9, 2018. https://0-doi-org.brum.beds.ac.uk/10.5281/zenodo.1317566.
IBM. ILOG CPLEX optimization studio. https://www.ibm.com/products/ilog-cplex-optimization-studio (Accessed 2019–06–17).
Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat Methods. 2016;13(12):966–7. https://0-doi-org.brum.beds.ac.uk/10.1038/nmeth.4077.
Sun B, Mason S, Wilson RC, Hazard SE, Wang Y, Fang R, Wang Q, Yeh ES, Yang M, Roberts TM, Zhao JJ, Wang Q. Inhibition of the transcriptional kinase CDK7 overcomes therapeutic resistance in HER2-positive breast cancers. Oncogene. 2020;39(1):50–63. https://0-doi-org.brum.beds.ac.uk/10.1038/s41388-019-0953-9.
Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25. https://0-doi-org.brum.beds.ac.uk/10.1016/j.cels.2015.12.004.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102(43):15545–50. https://0-doi-org.brum.beds.ac.uk/10.1073/pnas.0506580102.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9. https://0-doi-org.brum.beds.ac.uk/10.1038/75556.
BenderGroup/PIDGINv4, 2022. https://github.com/BenderGroup/PIDGINv4 (Accessed 2022-05-23).
Garcia-Alonso L, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. bioRxiv. 2018. https://0-doi-org.brum.beds.ac.uk/10.1101/337915.
Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat Commun. 2018. https://0-doi-org.brum.beds.ac.uk/10.1038/s41467-017-02391-6.
Csabai L, Fazekas D, Kadlecsik T, Szalay-Bekő M, Bohár B, Madgwick M, Módos D, Ölbei M, Gul L, Sudhakar P, Kubisch J, Oyeyemi OJ, Liska O, Ari E, Hotzi B, Billes VA, Molnár E, Földvári-Nagy L, Csályi K, Demeter A, Pápai N, Koltai M, Varga M, Lenti K, Farkas IJ, Türei D, Csermely P, Vellai T, Korcsmáros T. SignaLink3: a multi-layered resource to uncover tissue-specific signaling networks. Nucleic Acids Res. 2022;50(D1):D701–9. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkab909.
Licata L, Lo Surdo P, Iannuccelli M, Palma A, Micarelli E, Perfetto L, Peluso D, Calderone A, Castagnoli L, Cesareni G. SIGNOR 2.0, the signaling network open resource 2.0: 2019 update. Nucleic Acids Res. 2020;48(D1):D504–10. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkz949.
Edgar R, Domrachev M, Lash AE. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/30.1.207.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504. https://0-doi-org.brum.beds.ac.uk/10.1101/gr.1239303.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40. https://0-doi-org.brum.beds.ac.uk/10.1093/bioinformatics/btr260.
Solver benchmarks. https://cran.r-project.org/web/packages/prioritizr/vignettes/solver_benchmarks.html (Accessed 2023-02-11).
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):D1100–7. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkr777.
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–13. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkv951.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;1(12):2825–30.
RDKit: open-source cheminformatics software. https://www.rdkit.org/ (Accessed 2020–01–28).
Burger MC. Chem doodle web components: HTML5 toolkit for chemical graphics, interfaces, and informatics. J Cheminformatics. 2015;7(1):35. https://0-doi-org.brum.beds.ac.uk/10.1186/s13321-015-0085-3.
Chemdoodle. https://github.com/zachcp/chemdoodle (Accessed 2022–05–09).
Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminformatics. 2016;8(1):69. https://0-doi-org.brum.beds.ac.uk/10.1186/s13321-016-0182-y.
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet. 2016;48(8):838–47. https://0-doi-org.brum.beds.ac.uk/10.1038/ng.3593.
Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez J. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol. 2020;21(1):36. https://0-doi-org.brum.beds.ac.uk/10.1186/s13059-020-1949-z.
Michel Berkelaar. LpSolve: Interface to “Lp_solve” v. 5.5 to solve linear/integer programs, 2022. https://CRAN.R-project.org/package=lpSolve (Accessed 2022-09-13).
Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H, D’Eustachio P. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46(D1):D649–55. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkx1132.
Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 2018;46(D1):D661–7. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkx1064.
Väremo L, Nielsen J, Nookaew I. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res. 2013;41(8):4378–91. https://0-doi-org.brum.beds.ac.uk/10.1093/nar/gkt111.
Segovia-Mendoza M, González-González ME, Barrera D, Díaz L, García-Becerra R. Efficacy and mechanism of action of the tyrosine kinase inhibitors Gefitinib, Lapatinib and Neratinib in the treatment of HER2-positive breast cancer: preclinical and clinical evidence. Am J Cancer Res. 2015;5(9):2531–61.
Riester L W and M HGNChelper: identify and correct invalid HGNC human gene symbols and MGI mouse gene symbols, 2019. https://CRAN.R-project.org/package=HGNChelper (Accessed 2022–09–08).
Jasial S, Hu Y, Vogt M, Bajorath J. Activity-relevant similarity values for fingerprints and implications for similarity searching. F1000Research 2016, https://0-doi-org.brum.beds.ac.uk/10.12688/f1000research.8357.2.
Canfield K, Li J, Wilkins OM, Morrison MM, Ung M, Wells W, Williams CR, Liby KT, Vullhorst D, Buonanno A, Hu H, Schiff R, Cook RS, Kurokawa M. Receptor tyrosine kinase ERBB4 mediates acquired resistance to ERBB2 inhibitors in breast cancer cells. Cell Cycle. 2015;14(4):648–55. https://0-doi-org.brum.beds.ac.uk/10.4161/15384101.2014.994966.
Medina PJ, Goodin S. Lapatinib: a dual inhibitor of human epidermal growth factor receptor tyrosine kinases. Clin Ther. 2008;30(8):1426–47. https://0-doi-org.brum.beds.ac.uk/10.1016/j.clinthera.2008.08.008.
Matkar S, Sharma P, Gao S, Gurung B, Katona BW, Liao J, Muhammad AB, Kong X-C, Wang L, Jin G, Dang CV, Hua X. An epigenetic pathway regulates sensitivity of breast cancer cells to HER2 inhibition via FOXO/c-Myc axis. Cancer Cell. 2015;28(4):472–85. https://0-doi-org.brum.beds.ac.uk/10.1016/j.ccell.2015.09.005.
Deblois G, Smith HW, Tam IS, Gravel S-P, Caron M, Savage P, Labbé DP, Bégin LR, Tremblay ML, Park M, Bourque G, St-Pierre J, Muller WJ, Giguère V. ERRα mediates metabolic adaptations driving lapatinib resistance in breast cancer. Nat Commun. 2016;7(1):12156. https://0-doi-org.brum.beds.ac.uk/10.1038/ncomms12156.
Imami K, Sugiyama N, Imamura H, Wakabayashi M, Tomita M, Taniguchi M, Ueno T, Toi M, Ishihama Y. Temporal profiling of lapatinib-suppressed phosphorylation signals in EGFR/HER2 pathways. Mol Cell Proteomics MCP. 2012;11(12):1741–57. https://0-doi-org.brum.beds.ac.uk/10.1074/mcp.M112.019919.
Estévez LG, Suarez-Gauthier A, García E, Miró C, Calvo I, Fernández-Abad M, Herrero M, Marcos M, Márquez C, Lopez Ríos F, Perea S, Hidalgo M. Molecular effects of lapatinib in patients with HER2 positive ductal carcinoma in situ. Breast Cancer Res. 2014;16(4):R76. https://0-doi-org.brum.beds.ac.uk/10.1186/bcr3695.
Garrett JT, Olivares MG, Rinehart C, Granja-Ingram ND, Sánchez V, Chakrabarty A, Dave B, Cook RS, Pao W, McKinely E, Manning HC, Chang J, Arteaga CL. Transcriptional and posttranslational up-regulation of HER3 (ErbB3) compensates for inhibition of the HER2 tyrosine kinase. Proc Natl Acad Sci. 2011;108(12):5021–6. https://0-doi-org.brum.beds.ac.uk/10.1073/pnas.1016140108.
Nishimura D. BioCarta. Biotech Softw Internet Rep. 2001;2(3):117–20. https://0-doi-org.brum.beds.ac.uk/10.1089/152791601750294344.
Vogel C, Chan A, Gril B, Kim S-B, Kurebayashi J, Liu L, Lu Y-S, Moon H. Management of ErbB2-positive breast cancer: insights from preclinical and clinical studies with Lapatinib. Jpn J Clin Oncol. 2010;40(11):999–1013. https://0-doi-org.brum.beds.ac.uk/10.1093/jjco/hyq084.
Mahmud Z, Gomes AR, Lee HJ, Aimjongjun S, Jiramongkol Y, Yao S, Zona S, Alasiri G, Gong G, Yagüe E, Lam EW-F. EP300 and SIRT1/6 Co-regulate Lapatinib sensitivity via modulating FOXO3-acetylation and activity in breast cancer. Cancers. 2019;11(8):E1067. https://0-doi-org.brum.beds.ac.uk/10.3390/cancers11081067.
The authors would like to thank Julio Saez-Rodriguez’s research group for beta testing the app and giving feedbac and Aurelien Dugourd for providing helper scripts for the application. We would also like to thank Jeff Kriske at Eli Lilly and Company for building the Singularity container solution.
This work was supported by BBSRC and Eli Lilly and Company (Grant code BB/M011194/1) (L.H.G) and European Union’s Horizon 2020 Research and Innovation Programme H2020-ICT-2018-2 project iPC—individualized Paediatric Cure [Grant 826121] (R.H.B). The funding bodies did not play any roles in the design of the study and collection, analysis and interpretation of the data and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
R.H.B. has received consultant fees from QuantBio. H.B and D.A.C are/were both employees of Eli Lilly and Company. L.H.G was partially funded by Eli Lilly and Company and is now an employee of Ignota Labs. A.L. was funded by GSK and is now an employee of Boehringer Ingelheim.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hosseini-Gerami, L., Hernansaiz Ballesteros, R., Liu, A. et al. MAVEN: compound mechanism of action analysis and visualisation using transcriptomics and compound structure data in R/Shiny. BMC Bioinformatics 24, 344 (2023). https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-023-05416-8
- Mechanism of action
- Causal reasoning
- Systems biology