- Open Access
Scan for Motifs: a webserver for the analysis of post-transcriptional regulatory elements in the 3′ untranslated regions (3′ UTRs) of mRNAs
BMC Bioinformatics volume 15, Article number: 174 (2014)
Gene expression in vertebrate cells may be controlled post-transcriptionally through regulatory elements in mRNAs. These are usually located in the untranslated regions (UTRs) of mRNA sequences, particularly the 3′UTRs.
Scan for Motifs (SFM) simplifies the process of identifying a wide range of regulatory elements on alignments of vertebrate 3′UTRs. SFM includes identification of both RNA Binding Protein (RBP) sites and targets of miRNAs. In addition to searching pre-computed alignments, the tool provides users the flexibility to search their own sequences or alignments. The regulatory elements may be filtered by expected value cutoffs and are cross-referenced back to their respective sources and literature. The output is an interactive graphical representation, highlighting potential regulatory elements and overlaps between them. The output also provides simple statistics and links to related resources for complementary analyses. The overall process is intuitive and fast. As SFM is a free web-application, the user does not need to install any software or databases.
Visualisation of the binding sites of different classes of effectors that bind to 3′UTRs will facilitate the study of regulatory elements in 3′ UTRs.
The untranslated regions of mRNA sequences (UTRs) include most of the experimentally determined regulatory elements (REs) [1, 2]. This post-transcriptional regulatory information can affect the site at which a mRNA is polyadenylated, and then how, when and where it is translated [3, 4]. A number of tools and methods have been developed to identify cis-regulatory elements (CREs), many focusing on individual types of CREs in single sequences [5, 6]. These may ignore the detection of other types of CREs in the neighboring regions [7, 8]. For example, although there are a large number of algorithms to predict microRNA (miRNA) binding sites, reviewed in [9, 10], only one has included specific consideration of a nearby RNA binding protein (RBP) site . However, some miRNA targets are known to be affected by the presence of other elements or sequences nearby [1, 11–13]. Most regulatory elements are quite small (<12 bases) and many in silico predictions have high false positive rates. Visualisation of potential sites could improve the utility of predictions.
Some complex RNA elements can be both miRNA target sites and be bound by proteins [3, 14, 15]. Recent publications have shown evidence that specific types of miRNAs and RBPs work in concert to influence transcript decay [11, 16, 17] or translation  and this synergy has been included in some computational analyses for proteins  and miRNAs .
In many studies one specific gene of interest from a single species is being analysed. Recently developed systems: RegRNA 2.0 , AURA , ARESite , and UTRdb  have provided increasing support for this type of analysis. However, the analysis of sequence alignments, a representation of overlapping identified elements, E-value cutoff, and the ability to include custom sequence motifs in the analysis, are not currently available in a single tool. Scan for Motifs provides this for 3′UTR regions. It is primarily aimed at the analysis of human 3′UTRs, but can be used for any species sequences, alignments, or any part of the mRNA.
The RNA-Binding Protein DataBase (RBPDB) contains a collection of experimentally verified RNA binding sites, manually curated from literature. It currently contains binding data on 272 RBPs, but only 69 that have motifs in position frequency matrix (PFM) format most useful for SFM analysis. These PFM can be used to distinguish between good and poor matches for short motifs. The other individual binding site sequences from RBPDB could also be user specified (e.g. CAUY). Other user specified sequences, regular expressions, or matrices can also be used in PatSearch format .
Published miRNA sequences are from miRBase . The mature miRNA sequences were downloaded from miRBase website (file:mature.fa), processed (reverse complemented and 8 leading seed bases extracted) to get a list of 2042 named 8mer seeds and stored in a reference text file. The 6mer seed is the middle 6-bases, and both the two overlapping 7mers are used (7mer-A1, denoted A1 in the output, and 7mer-M8) .
The 3′UTR alignments used were obtained from TargetScan (v.6.2) along with the microRNA-binding site related files (miR Family, Predicted Conserved Targets Info, Conserved Family Info) . The ‘UTR_Sequences’ file holds multiple sequence alignments (MSA) of 23 vertebrate genomes aligned to human, extracted from the USCC human genome (hg18) databases by the TargetScan authors. The human specific sequences were extracted and the positional information for the miR-binding sites provided in “Predicted Conserved Targets Info” file was compared to and updated where needed) against the latest release of hg19 database (from UCSC). A bed format MySQL database table was created to hold the positional information for each of these miR-binding sites.
A custom Perl script was written and used for checking and updating the positional information as above. The program uses sequence similarity between the latest release of hg19 (from UCSC) and the UTR sequences from the TargetScan website. In most of the cases the sequences were 100% identical. For 27 genes the sequences were found to be different in length, the TargetScan prediction data for these were discarded, as they could not be unambiguously assigned to the sequence.
Accepting user input
The user input is of two types, i) query sequence(s) and ii) query element(s). Figure 2 shows the different input options available in SFM web-server.i) Query sequence. Option 1 in Figure 2 shows the different types of sequence that is accepted by SFM. It supports input of a standard human gene symbol (i.e. LIN28A) given as source of the query sequence. In such cases relative sequence alignments of 23 vertebrates (including human) will be retrieved from previously processed sequences using the inputted gene symbol and used as query sequence. Alternately, users can input FASTA/multiFASTA/clustalW alignments as well as tabular multiple sequence alignment (MSA) formatted sequences as query sequence. SFM supports assigning reference sequence when the query sequence has more than one sequence. If a human gene symbol was used to get the input sequence, the reference sequence is assigned to be human. In all other cases, the first sequence is considered to be the reference sequence.ii) Query elements. Option 2. A-E in Figure 2 shows the range of query elements expect value controls available in SFM. All the 77 Transterm elements (option 2. A in Figure 2) are associated with an background Expect-value (E-value) frequency of occurrence per thousand bases. These E-values were calculated by first creating a background set by dinucleotide shuffling a non-redundant set of 18,895 human 3′UTR sequences, then searching these with each of the elements. For example an expect value of 0.175 (the default) corresponds to an expectation that each element may appear on average by chance 0.175 times in a typical analysis of one human 3′ UTR of 1000 nt. Elements can be automatically selected/deselected by changing the E-value cutoff (shown in the red box in option 2. A in Figure 2.2). Additionally, users can give their own pattern or sequence motif (e.g. AUAGGGU), which will be searched along with the other selected elements against the query sequence(s) using PatSearch.Similarly, option 2.B-D (Figure 2) shows the elements from RBPDB, TargetScan and miRBase respectively along with the options to limit the hits based on MotifLocator calculated matches using the 69 RBPDB PFM. The TargetScan elements are available only when a published human gene symbol is used.Option 2.E (Figure 2) The default behaviour is only to show elements in non-reference sequences if also found in the reference sequence (e.g. human). This can be disabled using this option.
Upon receiving the input, SFM searches for the query elements using independent parallel processes, where the output from one process is not affected by another process (Figure 1). Irrespective of the input sequence types, all sequences are converted to FASTA format. The patterns from the selected TransTerm elements and user given pattern(s) are used to search the input sequences using PatSearch . The 69 RNA binding protein PFM from RBPDB are used to search the sequences with MotifLocator . The TargetScan miRNA binding sites and their position of occurrences were retrieved from the MySQL database table (see section 2.2.1) by using the input human gene symbol and mapped on the query sequences using PERL scripts labelled MotifMapper in Figure 1. Based on the user given seed length (6, 7 or 8 nucleotides), a list of seed sequences are created from the 2042 seed sequences. As one seed sequence can be associated with multiple miRNAs in a family, a non-redundant list of seed sequences was made. These sequences were used to search the query sequence(s) using PERL RegEx (regular expressions). Once all the processes are finished, the results from these processes are combined and sent to the visualisation module.
The output is shown on a scrollable alignment with links to further information and the ability to show or hide specific components of the complex results.
Results and discussion
The SFM web-server analyses sequences that may be aligned vertebrate UTRs, or user inputted sequences or alignments (Figure 1). Five types of elements are searched for in these sequences.
Regulatory elements from the TransTerm database, which includes relevant UTRSite and ARED elements. This provides a curated collection of CREs that function as translational control elements in mRNAs. The computational models (elements) are selected by the user, and/or filtered on empirically determined background frequencies in a shuffled control set. Matches are identified using PatSearch .
MicroRNA target sites predicted by TargetScan 6.2 . TargetScan was chosen as it is widely used, and predicts sites on vertebrate alignments
Human miRNAs 6 to 8 base seed sequences  using MotifMapper. This simple prediction is intended to allow visualisation of most of the potential miRNA binding sites, including likely false positives, if the user desires to.
User defined patterns in PatSearch format . PatSearch allows searches for simple strings, optionally with mismatches insertions and deletions (e.g. GNGNCC), but also more complex elements (e.g. GCG 3…7 GCG, two GCG separated by 3–7 bases) and RNA secondary structures (e.g. p1 = 10…10 4…7 ~ p1, a ten base stem with a loop of 4–7 bases). A full description of the syntax is presented in the help on the SFM server.
On completion of the individual processes, the results are compiled and presented as interactive visualisation (Figure 3). As an example, we use the well-studied tumor necrosis factor alpha (TNF) 3′ UTR. TNF is a multifunctional cytokine, it regulates the expression of other genes in inflammation and other processes and its expression is regulated at main steps . The TNF 3′ UTR has been shown to be targeted by both proteins and miRNA [13, 27] and is a classic example of an ARE containing mRNA. MicroRNAs that are confirmed to target this UTR in mammals are miR-16 , miR-19a , miR-125b , miR-130 , miR-181a , miR-301 . Unusually, a miR-369-3p containing RNA-protein complex binds to targets within the ARE and activates or represses translation in the cell cycle . This ARE may also be bound by the RNABP tristetraprolin (TTP) to repress translation .
In the SFM analysis using the settings in Figure 2, highlights several types of elements from the TransTerm database (Figure 3, yellow): the AU rich element (ARE) is represented by hits from three overlapping descriptions (Background E-value per thousand bases 0.06, 0.12, 0.12 respectively, Figure 3) ; TNF Alpha Stability and Efficiency Element (E-value 0.000008) ; and two descriptions of a Polyadenylation Element at the 3′ end (E-value 0.03, 0.02). These are all present in a similar position in the alignment across vertebrates, and the 9–12 base core ARE  is repeated . The two predicted stability elements in the TNF 3′ UTR have been verified experimentally [27, 35], and the polyadenylation signal has a clear match to the consensus (AAUAAA). In addition a 15-LOX-DICE element is predicted (E-value 0.01) in the same location in only 5 of 17 species. From the information linked from the small ‘i’ to the TransTerm entry it can be found that the 15-LOX-DICE is known to have a role in regulating mRNA stability of mRNAs in early erythropoiesis . This may be a false positive, or a novel finding requiring further investigation.
Three predicted overlapping miRNA binding sites are shown (Figure 3, red). Interesting they flank the ARE. Each site links to the family of miRNAs that could bind this seed (e.g. miR181abcd/462) this data is inherited from the TargetScan families and predictions . Included in these predictions are miR-19a, miR-181a, miR-130/miR-301 they have been shown to target these regions in the TNF UTR.
Not predicted with the conservative default SFM parameters are two sites for miR-369-3p within the ARE . These could be shown when 7mer miRBase seeds (miR-369-3p, UAUUAUU) are selected overlapping the ARE. These miR-369-3p sites are also conserved in the alignment. The TargetScan analysis with 153 ‘broadly conserved’ and ‘conserved’ miRNA families did not predict this site, as miR-369 is poorly conserved  so they are not shown in the results from this analysis (Figure 3 red). However, TargetScan does not predict this known site at all (TargetScan webserver) possibly due to the weak AU base pairing within this site.
Such short matches (6mer, 7mer) should be interpreted with caution, as there are over 4000 possible 7mer seeds from the 2043 mature human miRNA seeds in miRBase. This resulted in over 200 hits in the 17,000 nt TNF UTR alignment. However, most of these matches are not conserved (not present in a similar locations in the alignment) and can therefore be identified as likely false positives by visual inspection of the SFM output.SFM visually represents different types of element in one display (Figure 3). On the output page it also provides the user the choice to include/exclude any sets of elements in the analysis, as well as only showing elements also found in the reference sequence (e.g. human, when a gene symbol is used as input). Along with the graphical display, SFM also provides a text report listing the entire user input (selections and input sequence) as well as output of each individual search process.
SFM is a free web-application, allowing researchers to use a single tool to identify and investigate a range of CREs on both alignments and single sequences. Notably, these include both protein binding sites (Transterm, UTRSite, ARED) and miRNA binding sites (TargetScan, miRBase seed match). These elements come from well-documented databases and are cross-referenced to these. We believe that SFM will be particularly useful for researchers to uncover relationships among different classes of post-transcriptional regulatory elements.
Availability and requirements
Project name: Scan for motifs.
Project home page:http://bioanalysis.otago.ac.nz/sfm/.
Operating system: Platform independent.
Programming language: Perl, MySQL.
Other requirements: none.
License: Free to use.
Any restrictions to use by non-academics: None.
Jacobs GH, Chen A, Stevens SG, Stockwell PA, Black MA, Tate WP, Brown CM: Transterm: a database to aid the analysis of regulatory sequences in mRNAs. Nucleic Acids Res. 2009, 37 (Database issue): D72-D76.
Chang TH, Huang HY, Hsu JB, Weng SL, Horng JT, Huang HD: An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs. BMC Bioinforma. 2013, 14 (Suppl 2): S4-
Szostak E, Gebauer F: Translational control by 3′-UTR-binding proteins. Brief Funct Genomics. 2013, 12 (1): 58-65. 10.1093/bfgp/els056.
Michalova E, Vojtesek B, Hrstka R: Impaired pre-mRNA processing and altered architecture of 3′ untranslated regions contribute to the development of human disorders. Int J Mol Sci. 2013, 14 (8): 15681-15694. 10.3390/ijms140815681.
Stevens S, Brown C: In silico estimation of translation efficiency in human cell lines: potential evidence for widespread translational control. PLoS One. 2013, 8 (2): e57625-10.1371/journal.pone.0057625.
Gruber AR, Fallmann J, Kratochvill F, Kovarik P, Hofacker IL: AREsite: a database for the comprehensive investigation of AU-rich elements. Nucleic Acids Res. 2011, 39 (Database issue): D66-D69.
Stevens S, Brown C: Bioinformatic methods to discover cis-regulatory elements in mRNAs. Springer Handbook of Bio-/Neuro-informatics. Edited by: Kasabov N. 2014, Heidelberg: Springer, 151-169.
Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP: Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol. 2011, 18 (10): 1139-1146. 10.1038/nsmb.2115.
Dweep H, Sticht C, Gretz N: In-silico algorithms for the screening of possible microRNA binding sites and their interactions. Curr Genomics. 2013, 14 (2): 127-136. 10.2174/1389202911314020005.
Naifang S, Minping Q, Minghua D: Integrative approaches for microRNA target prediction: combining sequence information and the paired mRNA and miRNA expression profiles. Curr Bioinform. 2013, 8 (1): 37-45.
Incarnato D, Neri F, Diamanti D, Oliviero S: MREdictor: a two-step dynamic interaction model that accounts for mRNA accessibility and Pumilio binding accurately predicts microRNA targets. Nucleic Acids Res. 2013, 41 (18): 8421-8433. 10.1093/nar/gkt629.
Ciafre SA, Galardi S: microRNAs and RNA-binding proteins: a complex network of interactions and reciprocal regulations in cancer. RNA Biol. 2013, 10 (6): 935-942. 10.4161/rna.24641.
Vasudevan S, Tong Y, Steitz JA: Switching from repression to activation: microRNAs can up-regulate translation. Science. 2007, 318 (5858): 1931-1934. 10.1126/science.1149460.
Dethoff EA, Chugh J, Mustoe AM, Al-Hashimi HM: Functional complexity and regulation through RNA dynamics. Nature. 2012, 482 (7385): 322-330. 10.1038/nature10885.
Kedde M, van Kouwenhove M, Zwart W, Oude Vrielink JA, Elkon R, Agami R: A Pumilio-induced RNA structure switch in p27-3′ UTR controls miR-221 and miR-222 accessibility. Nat Cell Biol. 2010, 12 (10): 1014-1020. 10.1038/ncb2105.
Wu X, Chesoni S, Rondeau G, Tempesta C, Patel R, Charles S, Daginawala N, Zucconi BE, Kishor A, Xu G, Shi Y, Li ML, Irizarry-Barreto P, Welsh J, Wilson GM, Brewer G: Combinatorial mRNA binding by AUF1 and Argonaute 2 controls decay of selected target mRNAs. Nucleic Acids Res. 2013, 41 (4): 2644-2658. 10.1093/nar/gks1453.
Jiang P, Singh M, Coller HA: Computational assessment of the cooperativity between RNA binding proteins and MicroRNAs in transcript decay. PLoS Comput Biol. 2013, 9 (5): e1003075-10.1371/journal.pcbi.1003075.
Zhang C, Lee KY, Swanson MS, Darnell RB: Prediction of clustered RNA-binding protein motif sites in the mammalian genome. Nucleic Acids Res. 2013, 41 (14): 6793-6807. 10.1093/nar/gkt421.
Bryan K, Terrile M, Bray IM, Domingo-Fernandez R, Watters KM, Koster J, Versteeg R, Stallings RL: Discovery and visualization of miRNA-mRNA functional modules within integrated data using bicluster analysis. Nucleic Acids Res. 2014, 42 (3): e17-10.1093/nar/gkt1318.
Dassi E, Malossini A, Re A, Mazza T, Tebaldi T, Caputi L, Quattrone A: AURA: atlas of UTR regulatory activity. Bioinformatics. 2012, 28 (1): 142-144. 10.1093/bioinformatics/btr608.
Grillo G, Turi A, Licciulli F, Mignone F, Liuni S, Banfi S, Gennarino VA, Horner DS, Pavesi G, Picardi E, Pesole G: UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 2010, 38 (Database issue): D75-D80.
Grillo G, Licciulli F, Liuni S, Sbisa E, Pesole G: PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res. 2003, 31 (13): 3608-3612. 10.1093/nar/gkg548.
Kozomara A, Griffiths-Jones S: miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011, 39 (Database issue): D152-D157.
Claeys M, Storms V, Sun H, Michoel T, Marchal K: MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics. 2012, 28 (14): 1931-1932. 10.1093/bioinformatics/bts293.
Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR: RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 2011, 39 (Database issue): D301-D308.
Giambelluca M, Rollet-Labelle E, Bertheau-Mailhot G, Laflamme C: Post-transcriptional regulation of tumour necrosis factor alpha biosynthesis: Relevance to the pathophysiology of rheumatoid arthritis. OA Inflammation. 2013, 1 (1): 3-
Shi JX, Su X, Xu J, Zhang WY, Shi Y: HuR post-transcriptionally regulates TNF-alpha-induced IL-6 expression in human pulmonary microvascular endothelial cells mainly via tristetraprolin. Respir Physiol Neurobiol. 2012, 181 (2): 154-161. 10.1016/j.resp.2012.02.011.
Jing Q, Huang S, Guth S, Zarubin T, Motoyama A, Chen J, Di Padova F, Lin SC, Gram H, Han J: Involvement of microRNA in AU-rich element-mediated mRNA instability. Cell. 2005, 120 (5): 623-634. 10.1016/j.cell.2004.12.038.
Liu M, Wang Z, Yang S, Zhang W, He S, Hu C, Zhu H, Quan L, Bai J, Xu N: TNF-alpha is a novel target of miR-19a. Int J Oncol. 2011, 38 (4): 1013-1022.
Tili E, Michaille JJ, Cimino A, Costinean S, Dumitru CD, Adair B, Fabbri M, Alder H, Liu CG, Calin GA, Croce CM: Modulation of miR-155 and miR-125b levels following lipopolysaccharide/TNF-alpha stimulation and their possible roles in regulating the response to endotoxin shock. J Immunol. 2007, 179 (8): 5082-5089. 10.4049/jimmunol.179.8.5082.
Bak RO, Mikkelsen JG: Regulation of cytokines by small RNAs during skin inflammation. J Biomed Sci. 2010, 17: 53-10.1186/1423-0127-17-53.
Li H, Chen X, Guan L, Qi Q, Shu G, Jiang Q, Yuan L, Xi Q, Zhang Y: MiRNA-181a regulates adipogenesis by targeting tumor necrosis factor-alpha (TNF-alpha) in the porcine model. PLoS One. 2013, 8 (10): e71568-10.1371/journal.pone.0071568.
Qi MY, Wang ZZ, Zhang Z, Shao Q, Zeng A, Li XQ, Li WQ, Wang C, Tian FJ, Li Q, Zou J, Qin YW, Brewer G, Huang S, Jing Q: AU-rich-element-dependent translation repression requires the cooperation of tristetraprolin and RCK/P54. Mol Cell Biol. 2012, 32 (5): 913-928. 10.1128/MCB.05340-11.
Halees AS, El-Badrawi R, Khabar KS: ARED Organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse. Nucleic Acids Res. 2008, 36 (Database issue): D137-D140.
Hel Z, Di Marco S, Radzioch D: Characterization of the RNA binding proteins forming complexes with a novel putative regulatory region in the 3′-UTR of TNF-alpha mRNA. Nucleic Acids Res. 1998, 26 (11): 2803-2812. 10.1093/nar/26.11.2803.
Thiele BJ, Berger M, Huth A, Reimann I, Schwarz K, Thiele H: Tissue-specific translational regulation of alternative rabbit 15-lipoxygenase mRNAs differing in their 3′-untranslated regions. Nucleic Acids Res. 1999, 27 (8): 1828-1836. 10.1093/nar/27.8.1828.
This work was partially funded by a Human Frontier Science Foundation Research Grant [RGP0031/2009 to Ian Macara, Anne Spang and C.M.B.]; A.B. was a recipient of a University of Otago Postgraduate Scholarship and Publishing Bursary.
This work was partially funded by a Human Frontier Science Foundation Research Grant [RGP0031 2009 to Ian Macara, Anne Spang and C.M.B.]; A.B. is a recipient of a University of Otago Postgraduate Scholarship.
The authors declare that they have no competing interests.
AB designed and developed the software. CMB conceived of the application, supervised it, and tested it. Both authors wrote, read and approved the final manuscript.
About this article
Cite this article
Biswas, A., Brown, C.M. Scan for Motifs: a webserver for the analysis of post-transcriptional regulatory elements in the 3′ untranslated regions (3′ UTRs) of mRNAs. BMC Bioinformatics 15, 174 (2014). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2105-15-174
- Untranslated region
- RNA binding protein
- Translational control