Skip to main content
  • Research article
  • Open access
  • Published:

Analysis of secondary structural elementsĀ in human microRNA hairpin precursors

Abstract

Background

MicroRNAs (miRNAs) regulate gene expression by targeting complementary mRNAs for destruction or translational repression. Aberrant expression of miRNAs has been associated with various diseases including cancer, thus making them interesting therapeutic targets. The composite of secondary structural elements that comprise miRNAs could aid the design of small molecules that modulate their function.

Results

We analyzed the secondary structural elements, or motifs, present in all human miRNAĀ hairpin precursors and compared them to highly expressed human RNAs with known structures and other RNAs from various organisms. Amongst human miRNAs, there are 3808 are unique motifs, many residing in processing sites. Further, we identified motifs in miRNAs that are not present in other highly expressed human RNAs, desirable targets for small molecules. MiRNA motifs were incorporated into a searchable database that is freely available.

We also analyzed the most frequently occurring bulges and internal loops for each RNA class and found that the smallest loops possible prevail. However, the distribution of loops and the preferred closing base pairs were unique to each class.

Conclusions

Collectively, we have completed a broad survey of motifs found in human miRNA precursors, highly expressed human RNAs, and RNAs from other organisms. Interestingly, unique motifs were identified in human miRNA processing sites, binding to which could inhibit miRNA maturation and hence function.

Background

MicroRNAs (miRNAs) regulate gene expression via targeting mRNAs for destruction or translation repression [1ā€“4]. Aberrant miRNA expression is associated with diseases [5, 6] including cancers [7], cardiovascular diseases [8], and HIV [9, 10]. In addition to being employed to explore mRNA and protein function in vivo [5, 11], miRNAs are also being explored as therapeutic targets [12, 13], in particular because overexpression of oncogenic miRNAs aids initiation and progression of various tumors [14ā€“16]. Different strategies have been used to inhibit oncogenic miRNAs, including antisense or sponge oligonucleotides that bind mature miRNAs [17, 18] and inhibiting miRNA processing with small molecules [19ā€“21]. A major liabilityĀ of oligonucleotide-based therapeutics is poor tissue-specific delivery and cellular uptake [17]. Small molecules have been neglected for targeting RNA in general because it was speculated that RNA structural flexibility leads to lack of binding specificity. However, recent successful examples of using small molecules to target different RNAs [22, 23] have stimulated increasing interests in using small molecules to target miRNAs.

Usually, small molecules bind to non-canonically paired regions of RNA [22], such as bulges, internal loops, and hairpin loops (Fig.Ā 1), as they provide enlarged major grooves for small molecule entry and partially exposed bases that can be exploited to increase specificity [13, 24]. Thus, miRNA hairpin precursors, which fold into stem loop structures that display various types of loops (Fig.Ā 1) [25], are ideal candidates for small molecule binding. MiRNA processing occurs in both the nucleus (via Drosha) and the cytoplasm (via Dicer/transactivating response RNA-binding protein (TRBP)) [26]. Therefore, small molecules that localize to either compartment could inhibit miRNA maturation.

Fig. 1
figure 1

Schematic of the stem-loop structure of hsa-miR-20a. Red letters indicate the mature miRNA; blue letters indicate the mature miRNA*. Possible motifs in an RNA include internal loops, 5ā€™ bulges, 3ā€™ bulges, hairpins, and multibranch loopsĀ (not shown). The loops are named by the identity of unpaired nucleotides and base pairs (indicated by parentheses). The two sides of a bulge or an internal loop are indicated with a ā€œ/ā€

The number of known miRNA sequences has expanded tremendously [27, 28] because of the development of deep-sequencing technology. To develop specific small molecules that inhibit the processing of a single or few miRNAs, it is essential to identify unique secondary structural elements, or motifs. That is, it is important to know which motifs occur and their frequencies. In this study, we built a database of motifs found in human miRNA secondary structures. We examined the frequency of these motifs and which motifs are preferred at processing sites. It is still a mystery how the Dicer/TRBP complex achieves accuracy in processing pre-miRNAs with such huge diversity (more than a thousand different sequences in human). MiRNA processing sites (where the miRNA strands are cleaved) are presumed to be important. This analysis was then completed for RNAs with known structures, including highly expressed human RNAs. We hope that this analysis will eventually help our understanding of miRNA processing and improve identification of potential target sites for small molecules.

Methods

MiRNA hairpin precursor sequences and structures

All Homo sapiens miRNA and mature miRNA sequences were obtained from miRBase v.17 [27] (http://www.mirbase.org/). The secondary structures of miRNA hairpin precursors were predicted by RNAstructure [29], which uses a free energy minimization algorithm [30]. Please note that miRNA hairpin precursor structure determination via free energy minimization is the standard in the field [25].

Other RNA sequences and structures

A previously constructed database of other RNA structures was also analyzed in order to make comparisons to miRNAs [31]. The database contains 1349 RNAs including 123 small subunit rRNAs [32], 223 large subunit rRNAs [32, 33], 309 5S rRNAs [34], 484 tRNAs [35], 91 signal recognition particles [36], 16 RNase P RNAs [37], 100 group I introns [38, 39], and three group II introns [40]. We also analyzed highly expressed human RNAs with known structures including 5S rRNA, 16S rRNA, 23S rRNA, 7SL (signal recognition particle), RNase P RNA, U4/U6 snRNA, and 465 non-redundant tRNAs [41].

Motif nomenclature

The motifs predicted in miRNA hairpin precursor secondary structures include bulges, internal loops, hairpins, and multibranch loops (Figs.Ā 1 and 2). Bulges are divided into two categories: 5ā€™ bulge loops and 3ā€™ bulge loops. Its designation as 5ā€™ or 3ā€™ is determined by the position of the unpaired nucleotide relative to the first hairpin loop in the miRNA's secondary structure (if it is 5ā€™ to the hairpin loop or 3ā€™).

Fig. 2
figure 2

Diagrams of bulge and internal loops that have the same motifs but different orientations in miRNA hairpin precursor secondary structure (aā€“d), and a multibranch loop motif (e). (a) 5ā€™ bulge loop (AU)U/-(GC), (b) 3ā€™ bulge loop (CG)-/U(UA), (c) internal loop (CG)C/A(UA), (d) internal loop (AU)A/C(GC), and (e) multibranch loop (CG)A(GU)U(GC)C(GC). The motifs can be characterized by the identity of unpaired nucleotides (red letters) or the identity of unpaired nucleotides and closing base pairs (red and black letters). Note: (AU)U/-(GC) and (AU)-/U(GC) are different motifs. The equivalent 3ā€™ bulge for the 5ā€™ bulge (AU)U/- (GC) is (CG) -/U(UA)

A motif includes closing base pair(s) and non-canonically paired nucleotides. Sequences are always written 5ā€™ ā€“ 3ā€™. Closing base pairs are indicated with parentheses (for example, (GC)), and both nucleotides are always designated due to the possibility of GU pairs. The nucleotide 5ā€™ to the loop is always listed first. Base pairs are listed at the beginning and end of the motif sequence for bulges and internal loops, only at the beginning of hairpin loops, and between all unpaired regions of multibranch loops. A ā€œ/ā€ separates the two sides of bulges and internal loop. Please see Figs.Ā 1 and 2 and the Results & Discussion for examples.

Determination of statistical significance: are two motifsā€™ occurrence frequencies significantly different?

In order to determine if a particular motif is over- or under-represented, its statistical significance was calculated by a Z-score of type 1 error. That is, when Motif 1 occurred with probability p1 in a sample size n1, and Motif 2 occurred with probability p2 in a sample size n2, it is hypothesized that Motif 1 and Motif 2 occur with the same frequency. To reject this hypothesis, we calculate a Z-score using Eqs.Ā 1 and 2:

$$ \phi =\frac{n_1\times {p}_1+{n}_2\times {p}_2}{n_1+{n}_2} $$
(1)
$$ Z- score=\frac{p_1-{p}_2}{\sqrt{\phi \left(1-\phi \right)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} $$
(2)

If the Z-score >2, the hypothesis is rejected, and Motifs 1 and 2 have significantly different occurrence frequencies; if the Z-score <2, then no conclusion can be drawn.

Determination of miRNA processing sites

The processing sites of a miRNA are defined as the first and last nucleotides in the mature miRNA. The mature miRNA was mapped onto miRNAĀ hairpin precursors, and the motifs or paired regions containing the two end nucleotides were selected. If the site contains unpaired or non-canonically paired nucleotides, the processing site could be the unpaired nucleotides or the closing base pair. If the processing site is in a paired region, the base pairs next to the processing site are also included.

Results and discussion

A database of human miRNA hairpin precursor motifs

The number of human (Homo sapiens) miRNA sequences deposited in miRBase [27] has doubled in the past few years. As of August 2014, there were 1881 human miRNA sequences in miRBase. Although the secondary structures of most miRNAs have not been determined experimentally, a uniform system for miRNA annotation has been developed that employs secondary structure determination via free energy minimization [25, 29]. That is, the structures of miRNA hairpin precursors are accurately predicted from sequence. Therefore, RNAstructure [29], a free energy minimization algorithm that employs experimentally determined thermodynamic values, was used to predict the secondary structures of miRNAĀ hairpin precursors. Only the lowest free energy structure was considered in our analysis. All non-canonically paired regions except the dangling ends for each hairpin precursor secondary structure were extracted and listed in the motif database. The database contains the following information for each motif: the miRNA ID/accession number, motif type (bulge, internal loop, hairpin, etc.), unpaired motif (single stranded nucleotides only), motif (unpaired nucleotides and the closing base pair(s)), and motif with closing base pairs and first non-nearest neighbor.

Motif nomenclature

A motif includes unpaired or non-canonically paired regions (denoted in red) and its closing base pair(s) (denoted in black). Bulges and internal loops have two closing base pairs, hairpins have one closing pair, and multibranch loops have three or more. Examples of the nomenclature used are provided in Figs.Ā 1 and 2. For example, the 5ā€™ bulge loop in Fig.Ā 1 is indicated as (GC)G/-(UA) while the 3ā€™ bulge loop is named (GC)-/U(UA). Likewise, Internal Loop 1 is named (GC)C/A(AU); Internal Loop 2 is (UA)A/AA(AU); and the hairpin is named (GU)UUUAGU. For multibranch loops, the base pairs and the unpaired strands are written in order from 5ā€™ to 3ā€™ end. Since the 5ā€™ closing base pair is also the 3ā€™ closing base pair, it is repeated but in the opposite orientation. Thus, the multibranch loop in Fig.Ā 2e is named (CG)A(GU)U(GC)C(GC) (5ā€™ and 3ā€™ closing base pairs denoted in bold). This nomenclature was developed such that the same unpaired regions with different closing base pairs can be distinguished from each other, for example (AU)U/-(GC) and (CG)-/U(UA); or (CG)C/A(UA) and (AU)A/C(GC) (Fig.Ā 2).

General survey of motifs in precursor miRNAs

(A searchable database of motifs found in human miRNA hairpin precursors based on our analysis is available at: http://www.scripps.edu/disney/software.html.) The motifs present in miRNAĀ hairpin precursor secondary structures are quite diverse. Of all miRNAs, only 32 (2.2Ā %) have fully paired stems (absence of non-canonically paired regions). The remaining 97.8Ā % have 1ā€“14 motifs in the stem. There are a total of 7436 non-canonically paired motifs including 3862 internal loops, 1546 hairpin loops, 1089 5ā€™ bulge loops, 922 3ā€™ bulge loops, and 17 multibranch loops (Fig.Ā 3a).

Fig. 3
figure 3

Comparison of motif types found in human miRNA precursors, highly expressed human RNAs, and RNAs with known structures from various organisms. (a) plot of the number of each secondary structural motif in human miRNA precursors including 3ā€™ bulges (nā€‰=ā€‰924), 5ā€™ bulges (nā€‰=ā€‰1089), internal loops (nā€‰=ā€‰3860), hairpins (nā€‰=ā€‰1546), and multibranch loops (nā€‰=ā€‰17). (b) plot of the percentage of each motif within its motif type (for example, the percentage of 1-nucleotide bulges of total bulges). (c) plot of the percentage of each motif. Total motifs: human miRNA precursor, nā€‰=ā€‰7436; highly expressed human RNAs, nā€‰=ā€‰2712; all other RNAs, nā€‰=ā€‰26,213; *, p <0.05; **, p <0.01; ***, pā€‰<ā€‰0.001

There are 2334 unique motifs (occur only once) if the base pairs and their orientations are not considered (31.4Ā % of total). If closing pairs and their orientations are considered, then there are 3808 unique motifs (51.2Ā % of total). Previous studies have shown that loop closing pairs can dramatically affect loop structure [42, 43]. Not surprisingly, changing a loopā€™s closing pairs can affect small molecule affinity [44, 45]. Many motifs appeared only once, providing a potential specific target site for small molecules. Further analysis was only completed on bulges and internal loops since the diversity of the hairpin loops was too large (see bar labeled ā€œothersā€ in Fig.Ā 3a) and the sample size of multibranch loops is too small (17 motifs) for meaningful analysis (Fig.Ā 3a).

General survey of motifs in other types of RNAs

The motifs present in other RNAs are also diverse. There are a total of 26213 non-canonically paired motifs: 6937 bulges, 8457 internal loops, and 10819 hairpins. For highly expressed human RNAs with known structures, there are 2712 total motifs including 157 5ā€™ bulges, 123 3ā€™ bulges, 378 internal loops, 1521 hairpins, and 534 multibranch loops. Differences were observed in the distribution of motifs between other types of RNAs and human miRNAs. For example, the percentage of large hairpins is significantly less in other RNAs as compared to miRNAs (Fig.Ā 3b). In contrast, the percentage of 4-nucleotide hairpins and 2-nucleotide bulges in much greater (Fig.Ā 3b).

Small loops prevail in bulges and internal loops

As listed in TableĀ 1 and shown in Fig.Ā 3, the most highly represented bulges and internal loops for precursor miRNAs are the smallest possible size: 1-nucleotide bulges and 1ā€‰Ć—ā€‰1 nucleotide internal loops. Specifically, 69.3Ā % of 5ā€™ bulge loops and 71.4Ā % of 3ā€™ bulge loops are one-nucleotide bulges. Not surprisingly, the four possible 1-nucleotide bulges are the four most prevalent bulge loops. Two-nucleotide bulges are next most prevalent (15.0Ā % for 5ā€™ bulge and 12.6Ā % for 3ā€™ bulge). Likewise, small bulges and internal loops prevail in other types of RNAs and highly expressed human RNAs. For example, 1- and 2-nucleotide bulges account for ~92Ā % of all bulges of other RNAs and 85Ā % of human RNAs.

Table 1 The 20 most frequent 5ā€™Ā bulges, 3ā€™Ā bulges, and internal loop

For internal loops in precursor miRNAs, 55.4Ā % of the 3860 internal loops are 1ā€‰Ć—ā€‰1 nucleotide internal loops. The second most prevalent internal loop size is 2ā€‰Ć—ā€‰2 (11.2Ā %) followed by 1ā€‰Ć—ā€‰2 and 2ā€‰Ć—ā€‰1 internal loops (8.9Ā %) (Fig.Ā 3a). This overall trend is similar for other RNAs: 1ā€‰Ć—ā€‰1 loops account for 39.8Ā % of all loops while 2ā€‰Ć—ā€‰2 and 1ā€‰Ć—ā€‰2 / 2ā€‰Ć—ā€‰1 nucleotide loops account for 11.8Ā % and 15.1Ā %, respectively. In highly expressed human RNAs, 1ā€‰Ć—ā€‰1 loops account for 49.7Ā % of all loops while 2ā€‰Ć—ā€‰2 and 1ā€‰Ć—ā€‰2 / 2ā€‰Ć—ā€‰1 nucleotide loops account for 6.9Ā % and 7.7Ā %, respectively. Since smaller bulges and internal loops are thermodynamically more stable than their larger counterparts [46ā€“51], it is not surprising that they are more highly represented.

Nucleotide preferences in single nucleotide 5ā€™ bulge and 3ā€™ bulge loops in precursor miRNAs

From thermodynamic studies, 1-nucleotide pyrimidine bulges (C or U) are more stable than 1-nucleotide purine bulges (A or G) independent of bulge position (5ā€™ or 3ā€™) [51]. Thus, one might expect that pyrimidine bulges would occur more frequently than purine bulges and that the position of the bulge (5ā€™ or 3ā€™) would not influence the order of frequency. In order to investigate if miRNA hairpin precursors have a preference for certain nucleotides and if this preference is position-dependent, we employed a pooled population comparison, a statistical approach that affords a confidence interval that the preference is not random (see Methods). For example, when ā€œMotif 1ā€ occurs with a certain probability within a given sample size, a random distribution assumes that ā€œMotif 2ā€ occurs with a similar probability. To reject this hypothesis, a Z-score is calculated, which represents the confidence that an increased or decreased frequency of a motif did not occur randomly and thus is truly enriched or depleted.

As shown in Fig.Ā 4 and listed in TableĀ 1, the order of single nucleotide occurrence in 5ā€™ bulges is Uā€‰>ā€‰Aā€‰>ā€‰Cā€‰>ā€‰G while in 3ā€™ bulges the order is Aā€‰ā‰ˆā€‰Uā€‰>ā€‰Cā€‰>ā€‰G (TableĀ 1 and Fig.Ā 4). (Please note that ā€œ>ā€ indicates the two frequencies of occurrence are significantly different with Z-score >2 while ā€œā‰ˆā€ indicates Z-score <2). These orders are not correlated to the order of 1-nucleotide bulge thermodynamic stabilities (Cā€‰ā‰ˆā€‰Uā€‰>ā€‰Aā€‰ā‰ˆā€‰G). Furthermore, the occurrences of U in 5ā€™ bulges and 3ā€™ bulges are similar (0.236 and 0.233, respectively) as is the occurrences of C or G in 5ā€™ bulges and 3ā€™ bulges. However, A occurs more frequently as a 3ā€™ bulge than a 5ā€™ bulge with Z-scoreā€‰=ā€‰2.08. For highly expressed human RNAs, the trends are: 5ā€™ bulge nucleotide: Aā€‰ā‰ˆā€‰Cā€‰ā‰ˆā€‰Uā€‰>ā€‰G; 3ā€™ bulge nucleotide: Aā€‰ā‰ˆā€‰Cā€‰>ā€‰Uā€‰ā‰ˆā€‰G, although none of these differences is statistically significant.

Fig. 4
figure 4

Plot of the number of the most frequently occurring 5ā€™ bulges, 3ā€™ bulges, and internal loops without considering closing base pairs in miRNA precursors and all other RNAs. (a) distribution in miRNAs. In each group, different colors indicate that the difference in the rate of occurrence is statistically significant. (b) comparison of miRNAs to highly expressed human RNAs and RNAs from other organisms. *, p <0.05; **, p <0.01; ***, pā€‰<ā€‰0.001

The distribution of nucleotides in 1-nucleotide bulges is similar for human miRNAs and other highly expressed human RNAs; indeed, there are no statistically significant differences between them. In contrast, 1-nucleotide A bulges appear more often in RNAs from other organisms while 1-nucleotide C bulges appear less often (Fig.Ā 4b).

The structure of an RNA in general and bulges in particular [52] can be dynamic, resulting in multiple folds. Thus, the thermodynamically optimal state of an unbound RNA target may not be the same as the three dimensional structure of a protein- or small molecule-bound state. This may be advantageous for targeting RNA as the RNA's structure may remodel to accommodate ligand binding in a conformational selection mechanism.

Bulges prefer different closing base pairs

For each frequently occurring bulge, there are diverse combinations of closing base pairs, and their frequencies are dependent upon the bulged nucleotide. For example, there are 25 different closing base pair combinations for 5ā€™ bulge U, and the occurrences of these closing pair combinations are different, ranging from 1 to 39 (Fig.Ā 5a).

Fig. 5
figure 5

Analysis of bulges found in human miRNAs. (a) the occurrences of 5ā€™ bulge Uā€™s with different closing base pair combinations. (b) the number of 1-nucleotide 5ā€™ bulges with different closing base pair combinations found in human miRNAĀ hairpin precursors. The most frequently occurring closing base pair combination was determined for each 1-nucleotide bulge, and then calculated for all others. Each closing base pair combination has a unique color, which is applied to each type of bulge. (c) the directionality of the motif (5ā€™ bulge vs. 3ā€™ bulge) influences preference of closing base pairs.

We analyzed all 5ā€™ 1-nucleotide bulges to determine if there is a preference for the most frequently occurring closing base pair combinations. FigureĀ 5b shows that each 5ā€™ bulge prefers different closing base pair combinations. In some cases, the position of the bulge also influences the preferred closing base pairs; that is, whether it is a 5ā€™ or 3ā€™ bulge (Fig.Ā 5c). For example, 5ā€™ bulge (UA)U/-(GC) occurs 39 times (2nd most prevalent) while 3ā€™ bulge (UA)-/U(GC) occurs only 11 times (7th-most prevalent).

As shown schematically in Fig.Ā 2, the same motif (including closing base pairs) could be placed in different orientations in the miRNA's structure. Since their thermodynamic stabilities are the same, we inquired if the direction affects the frequency of occurrence. For example, 5ā€™ bulge (UA)U/-(GC) is the same as 3ā€™ bulge (CG)-/U(AU). The 5ā€™ bulge (UA)U/-(GC) was observed 39 times in human miRNAs (the most frequent base pair combination). However, the 3ā€™ bulge (CG)-/U(AU) was not observed.

There are examples in which the directionality of a motif does not affect occurrence. For example, 5ā€™ bulge (GC)U/-(GC) occurs 24 times; the corresponding 3ā€™ bulge, (CG)-/U(CG), also occurs 24 times. A more sophisticated analysis will be required in order to determine why directionality matters for some motifs but not others.

As observed for miRNA precursors, each 5ā€™ and 3ā€™ 1-nucleotide bulge in highly expressed human RNAs has a different distribution of observed closing base pairs (Additional file 1: Figure S1). Because of the small sample size (nā€‰=ā€‰88 for 3ā€™ bulges and nā€‰=ā€‰121 for 5ā€™ bulges), statistically significant differences were not observed. The most frequently occurring 5ā€™ bulges were (UA)A/-(GC) (nā€‰=ā€‰11) while the most frequently occurring 3ā€™ bulge was (UG)-/G(UA) (nā€‰=ā€‰7). Interestingly, the 5ā€™ bulge (UA)A/-(GC) was not observed as a 3ā€™ bulge (CG)-/A(AU). Another frequently occurring 5ā€™ bulge, (GC)U/- (GC) (nā€‰=ā€‰7) was also not observed as a 3ā€™ bulge. The most frequently occurring 3ā€™ bulge, (UG)-/G(UA), was only observed once as the corresponding 5ā€™ bulge.

Nucleotide preferences for 1ā€‰Ć—ā€‰1 nucleotide internal loops

The ten possible 1ā€‰Ć—ā€‰1 nucleotide internal loops are the ten most frequently occurring internal loops in miRNA hairpin precursors (TableĀ 1). They can be divided into three groups based on their frequencies of occurrence. In order for an internal loop to be placed in a particular group, its Z-scoreā€‰>ā€‰2 when compared to the loops in the other groups (TableĀ 2 and Fig.Ā 4). Group 1 contains the most frequently occurring loops including G/G, A/C, C/A, and U/U; Group 2 (second most frequently occurring) includes U/C and C/U; and Group 3 (least frequently occurring) includes A/A, C/C, G/A, and A/G. It is important to point out A/C and C/A are the same motifs but different orientations as are U/C and C/U, and G/A and A/G. Evidently, the direction of the unpaired nucleotides does not matter. For 1ā€‰Ć—ā€‰1 nucleotide loops in which both nucleotides are the same, the order of occurrence is G/Gā€‰ā‰ˆā€‰U/Uā€‰>ā€‰A/Aā€‰ā‰ˆā€‰C/C, which is different from the order observed for bulge loops.

Table 2 Relative Z-scores for the occurrences of different 1ā€‰Ć—ā€‰1 nucleotide internal loops (no consideration of closing base pairs)

Differences in frequency are observed when comparing 1ā€‰Ć—ā€‰1 nucleotide internal loops in highly expressed human RNAs and other RNAs. For example, G/G loops appear more frequently in highly expressed human RNAs and less frequently in RNAs from other organisms as compared to miRNAs. A/C, A/G, and U/U loops appear more frequently in other RNAs than in miRNA precursors.

1ā€‰Ć—ā€‰1 nucleotide internal loops also have preferences for closing base pairs

Previous studies have shown that loop closing base pairs affect loop thermodynamic stability and structure [46, 48, 49]. We therefore investigated if the five most frequently occurring 1ā€‰Ć—ā€‰1 nucleotide loops (G/G, A/C, C/A, U/U, and U/C) in miRNAs have closing base pair preferences. In this analysis, AU and UA, GC and CG, and GU and UG closing base pairs were grouped together. (Thus, AU indicates AU and UA closing pairs; GC indicates GC and CG closing pairs; and GU indicates GU and UG closing pairs.) The results are summarized in Fig.Ā 6. Interestingly, G/G, A/C, and U/C have the same order of preference for 5ā€™ closing base pairs: AUā€‰>ā€‰GCā€‰>ā€‰GU. C/A and U/U prefer GCā€‰>ā€‰AUā€‰>ā€‰GU for the 5ā€™ closing pair. In contrast, A/C, U/U, and U/C have the same 3ā€™ closing base pair preferences: GCā€‰>ā€‰AUā€‰>ā€‰GU. Unique trends are observed for G/G (AUā€‰>ā€‰GCā€‰ā‰ˆā€‰GU) and C/A (AUā€‰ā‰ˆā€‰GCā€‰>ā€‰GU).

Fig. 6
figure 6

The five most frequently occurring 1Ɨ1 nucleotide internal loops in human miRNA precursors have different preferences for 5ā€™ and 3ā€™ closing base pairs. Please note that ā€œ5ā€™AUā€ indicates a 5ā€™AU or 5ā€™UA closing base pair. Likewise, ā€œ5ā€™GCā€ indicates a 5ā€™GC or 5ā€™CG closing base pair, and ā€œ5ā€™GUā€ indicates a 5ā€™GU or 5ā€™UG closing base pair. Interestingly, changing the orientation of the loop nucleotides changes closing base pair preferences. For example, in miRNAs, C/A prefers 5ā€™ GCā€‰>ā€‰AUā€‰>ā€‰GU and 3ā€™ AUā€‰>ā€‰GCā€‰>ā€‰GU while A/C prefers 5ā€™ AUā€‰>ā€‰GCā€‰>ā€‰GU and 3ā€™ AUā€‰ā‰ˆā€‰GCā€‰>ā€‰GU. The distribution of closing pairs is different for highly expressed human RNAs and RNAs from other organisms. Statistically significant differences were observed for RNAs from other organisms as indicated by *, p <0.05; **, p <0.01; ***, pā€‰<ā€‰0.001

As was observed with bulges, directionality affects frequency in some cases. For example, C/A and A/C internal loops have different preferences for the 5ā€™ closing base pair. Similarly, internal loop (UA)C/A(GC) and (CG)A/C(AU) are the same loop. However, (UA)C/A(GC) occurs 29 times while internal loop (CG)A/C(AU) occurs 14 times. The difference in the frequency of occurrence is statistically significant (Z-scoreā€‰=ā€‰2.32).

Since the most frequently occurring 1ā€‰Ć—ā€‰1 nucleotide loops were similar in highly expressed human RNAs and RNAs from other organisms with known structures, we also studied closing base pair preferences for those RNAs. Unlike miRNA precursors, the five loops each have unique preferences for 5ā€™ and 3ā€™ closing base pairs (Fig.Ā 6). For highly expressed human RNAs, an analysis of the closing base pairs of all 1ā€‰Ć—ā€‰1 nucleotide loops reveals that GU closing pairs are discriminated against as both 5ā€™ and 3ā€™ closing pairs as compared to GC pairs (Z-scoreā€‰=ā€‰2.92 and 2.77, respectively). There is no statistically significant difference between GC and AU closing pairs or between AU and GU closing pairs. There are statistically significant differences in the closing base pairs for the five loops when comparing human miRNA precursors to RNAs from other organisms (nā€‰=ā€‰12; Fig.Ā 6). The most statistically significant difference is the preference for 3ā€™GC closing pairs for A/C internal loops (pā€‰<ā€‰0.0001).

MiRNA processing sites

Presumably, the functionally important sites in miRNAĀ hairpin precursors are the processing sites, where precursors are cleaved by Dicer and Drosha to form the mature miRNA. How do Dicer and Drosha determine the exact sites to cleave? Are they chosen by a specific sequence, motif, or proximity to up/downstream elements? We therefore analyzed the secondary structures of Dicer and Drosha processing sites.

The site corresponding to the 5ā€™ end of the mature RNA is referred to as the start processing site while the 3ā€™ end of the mature RNA is referred to as the end processing site (Fig.Ā 1). The processing site nucleotide can be paired (including loop closing base pairs), a bulged nucleotide, an internal loop nucleotide, a hairpin nucleotide, or at the terminal ends. Of all start processing site nucleotides, 57.7Ā % are paired (including loop closing pairs) while 49.0Ā % of end processing site nucleotides are paired. This difference is statistically significant; that is, it can be stated that start processing site nucleotides occur more frequently as paired than end processing site nucleotides do (Z-scoreā€‰=ā€‰4.68). There are also a small number of processing sites in terminal endsā€”17 start processing sites and 28 end processing sites.

We next determined the number of unique motifs that reside in Dicer and Drosha processing sites. If considering only loop nucleotides, there are 507 unique Dicer (nā€‰=ā€‰334) and Drosha (nā€‰=ā€‰173) processing sites. This corresponds to 17.8Ā % of all processing sites, 21.7Ā % of all unique miRNA motifs, and 6.8Ā % of all miRNA motifs. Of the 507 unique Dicer and Drosha processing sites, 39 are present in highly expressed human RNAs. If closing base pairs also confer uniqueness, then there are 752 unique Dicer (nā€‰=ā€‰451) and Drosha (nā€‰=ā€‰301) sites, corresponding to 26.4Ā % of all processing sites, 19.7Ā % of all unique miRNA motifs, and 10.1Ā % of all miRNA motifs. The majority of unique Dicer processing sites reside in internal loops (38.4Ā % when considering closing base pairs) or hairpins (44.3Ā % when considering closing base pairs), while the majority of unique Drosha sites reside in internal loops (85.4Ā % when considering closing base pairs). Of these sites, 742 are unique to human miRNAs as compared to highly expressed human RNAs.

Conclusions

In this study, we constructed a database of the secondary structural elements (motifs) found in human miRNA hairpin precursor secondary structures. Analysis of this database reveals that small loops prevail in bulges and internal loops. Interestingly, loops and bulges have significantly different preference for loop nucleotides, which also dictate preference for closing base pairs and closing base pair combinations. The origins of these preferences are not clear, but they likely affect the binding of proteins and small molecules. We also examined the motifs present at miRNA processing sites. More than half of the 5ā€™ (start) and 3ā€™ (end) processing sites are in paired regions. Hopefully, the database and its analysis will facilitate the development of small molecules that specifically bind and modulate miRNA function, in particular, those that are associated with cancer or other diseases.

References

  1. Krol J, Loedige I, Filipowicz W. The widespread regulation of microRNA biogenesis, function and decay. Nat Rev Genet. 2010;11(9):597ā€“610.

    PubMedĀ  CASĀ  Google ScholarĀ 

  2. Kim VN, Han J, Siomi MC. Biogenesis of small RNAs in animals. Nat Rev Mol Cell Bio. 2009;10(2):126ā€“39.

    ArticleĀ  CASĀ  Google ScholarĀ 

  3. Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol. 2006;57:19ā€“53.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  4. He L, Hannon GJ. MicroRNAs: Small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5(7):522ā€“31.

  5. Cui QH, Lu M, Zhang QP, Deng M, Miao J, Guo YH, et al. An analysis of human microRNA and disease associations. PLoS One. 2008;3(10):e3420.

  6. Sander C, Betel D, Wilson M, Gabow A, Marks DS. The microRNA.org resource: targets and expression. Nucleic Acids Res. 2008;36:D149ā€“53.

    PubMedĀ  PubMed CentralĀ  Google ScholarĀ 

  7. Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6(11):857ā€“66.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  8. Olson EN, Small EM. Pervasive roles of microRNAs in cardiovascular biology. Nature. 2011;469(7330):336ā€“42.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  9. Benkirane M, Triboulet R, Mari B, Lin YL, Chable-Bessia C, Bennasser Y, et al. Suppression of microRNA-silencing pathway by HIV-1 during virus replication. Science. 2007;315(5818):1579ā€“82.

  10. Huang J, Wang F, Argyris E, Chen K, Liang Z, Tian H, et al. Cellular microRNAs contribute to HIV-1 latency in resting primary CD4+ T lymphocytes. Nat Med. 2007;13(10):1241ā€“7.

  11. John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. Human microRNA targets (vol 2, pg 1862, 2005). PLoS Biol. 2005;3(7):1328ā€“8.

  12. Garzon R, Marcucci G, Croce CM. Targeting microRNAs in cancer: rationale, strategies and challenges. Nat Rev Drug Discov. 2010;9(10):775ā€“89.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  13. Calin GA, Zhang S, Chen L, Jung EJ. Targeting microRNAs with small molecules: from dream to reality. Clin Pharmacol Ther. 2010;87(6):754ā€“8.

  14. Petrocca F, Visone R, Onelli MR, Shah MH, Nicoloso MS, de Martino I, et al. E2F1-regulated microRNAs impair TGFbeta-dependent cell-cycle arrest and apoptosis in gastric cancer. Cancer Cell. 2008;13(3):272ā€“86.

  15. Frankel LB, Christoffersen NR, Jacobsen A, Lindow M, Krogh A, Lund AH. Programmed cell death 4 (PDCD4) is an important functional target of the microRNA miR-21 in breast cancer cells. J Biol Chem. 2008;283(2):1026ā€“33.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  16. Meng F, Henson R, Wehbe-Janek H, Ghoshal K, Jacob ST, Patel T. MicroRNA-21 regulates expression of the PTEN tumor suppressor gene in human hepatocellular cancer. Gastroenterology. 2007;133(2):647ā€“58.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  17. Aagaard L, Rossi JJ. RNAi therapeutics: principles, prospects and challenges. Adv Drug Deliv Rev. 2007;59(2ā€“3):75ā€“86.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  18. Loya CM, Lu CS, Van Vactor D, Fulga TA. Transgenic microRNA inhibition with spatiotemporal specificity in intact organisms. Nat Methods. 2009;6(12):897ā€“903.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  19. Bose D, Jayaraj G, Suryawanshi H, Agarwala P, Pore SK, Banerjee R, et al. The tuberculosis drug streptomycin as a potential cancer therapeutic: inhibition of miR-21 function by directly targeting its precursor. Angew Chem Int Ed Engl. 2012;51(4):1019ā€“23.

  20. Velagapudi SP, Disney MD. Two-dimensional combinatorial screening enables the bottom-up design of a microRNA-10b inhibitor. Chem Commun (Camb). 2014;50(23):3027ā€“9.

    ArticleĀ  CASĀ  Google ScholarĀ 

  21. Velagapudi SP, Gallo SM, Disney MD. Sequence-based design of bioactive small molecules that target precursor microRNAs. Nat Chem Biol. 2014;10(4):291ā€“7.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  22. Thomas JR, Hergenrother PJ. Targeting RNA with small molecules. Chem Rev. 2008;108(4):1171ā€“224.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  23. Guan L, Disney MD. Recent advances in developing small molecules targeting RNA. ACS Chem Biol. 2012;7(1):73ā€“86.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  24. Tran T, Disney MD. Two-dimensional combinatorial screening of a bacterial rRNA A-site-like motif library: defining privileged asymmetric internal loops that bind aminoglycosides. Biochemistry. 2010;49(9):1833ā€“42.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  25. Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, et al. A uniform system for microRNA annotation. RNA. 2003;9(3):277ā€“9.

  26. Lee Y,Ā Jeon K,Ā Lee JT,Ā Kim S,Ā Kim VN.Ā MicroRNAĀ maturation: stepwiseĀ processingĀ and subcellular localization. EMBO J. 2002; 21(17):4663-70.

  27. Griffiths-Jones S, Kozomara A. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152ā€“7.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  28. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:D154ā€“8.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  29. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A. 2004;101(19):7287ā€“92.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  30. Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16(3):270ā€“8.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  31. Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288(5):911ā€“40.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  32. Gutell RR. Collection of small subunit (16S- and 16S-like) ribosomal RNA structures: 1994. Nucleic Acids Res. 1994;22(17):3502ā€“7.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  33. Schnare MN, Damberger SH, Gray MW, Gutell RR. Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23 S-like) ribosomal RNA. J Mol Biol. 1996;256(4):701ā€“19.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  34. Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J. 5S ribosomal RNA database. Nucleic Acids Res. 2002;30(1):176ā€“8.

  35. Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998;26(1):148ā€“53.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  36. Larsen N, Samuelsson T, Zwieb C. The signal recognition particle database (SRPDB). Nucleic Acids Res. 1998;26(1):177ā€“8.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  37. Brown JW. The ribonuclease P database. Nucleic Acids Res. 1998;26(1):351ā€“2.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  38. Damberger SH, Gutell RR. A comparative database of group I intron structures. Nucleic Acids Res. 1994;22(17):3508ā€“10.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  39. Waring RB, Davies RW. Assessment of a model for intron RNA secondary structure relevant to RNA self-splicing--a review. Gene. 1984;28(3):277ā€“91.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  40. Michel F, Umesono K, Ozeki H. Comparative and functional anatomy of group II catalytic introns--a review. Gene. 1989;82(1):5ā€“30.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  41. Juhling F, Morl M, Hartmann RK, Sprinzl M, Stadler PF, Putz J. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009;37(Database issue):D159ā€“62.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  42. SantaLucia Jr J, Turner DH. Structure of (rGGCGAGCC)2 in solution from NMR and restrained molecular dynamics. Biochemistry. 1993;32(47):12612ā€“23.

  43. Wu M, Turner DH. So-lution structure of (rGCGGACGC)2 by two-dimensional NMR and the iterative relaxation matrix approach. Biochemistry. 1996;35(30):9677ā€“89.

  44. Pushechnikov A, Lee MM, Childs-Disney JL, Sobczak K, French JM, Thornton CA, et al. Rational design of ligands targeting triplet repeating transcripts that cause RNA dominant disease: application to myotonic muscular dystrophy type 1 and spinocerebellar ataxia type 3. J Am Chem Soc. 2009;131(28):9767ā€“79.

  45. Tran T, Disney MD. Molecular recognition of 6ā€™-N-5-hexynoate kanamycin A and RNA 1x1 internal loops containing CA mismatches. Biochemistry. 2011;50(6):962ā€“9.

    ArticleĀ  PubMedĀ  PubMed CentralĀ  CASĀ  Google ScholarĀ 

  46. Chen G, Znosko BM, Jiao X, Turner DH. Factors affecting thermodynamic stabilities of RNA 3 x 3 internal loops. Biochemistry. 2004;43(40):12865ā€“76.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  47. Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neilson T, et al. Improved free-energy parameters for predictions of RNA duplex stability. Proc Natl Acad Sci U S A. 1986;83(24):9373ā€“7.

  48. Schroeder SJ, Burkard ME, Turner DH. The energetics of small internal loops in RNA. Biopolymers. 1999;52(4):157ā€“67.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  49. Schroeder SJ, Turner DH. Thermodynamic stabilities of internal loops with GU closing pairs in RNA. Biochemistry. 2001;40(38):11509ā€“17.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  50. Zhu J, Wartell RM. The effect of base sequence on the stability of RNA and DNA single base bulges. Biochemistry. 1999;38(48):15986ā€“93.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  51. Znosko BM, Silvestri SB, Volkman H, Boswell B, Serra MJ. Thermodynamic parameters for an expanded nearest-neighbor model for the formation of RNA duplexes with single nucleotide bulges. Biochemistry. 2002;41(33):10406ā€“17.

    ArticleĀ  PubMedĀ  CASĀ  Google ScholarĀ 

  52. Stelzer AC, Frank AT, Kratz JD, Swanson MD, Gonzalez-Hernandez MJ, Lee J, et al. Discovery of selective bioactive small molecules by targeting an RNA dynamic ensemble. Nat Chem Biol. 2011;7(8):553ā€“9.

Download references

Acknowledgments

This work was funded by the National Institutes of Health (R01-GM097455 to MDD and R15-GM085699 to BMZ) and The Scripps Research Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew D. Disney.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authorsā€™ contributions

BL completed data analysis on miRNAs and drafted the manuscript; JLC completed data analysis on internal loop closing base pairs and highly expressed human RNAs; BMZ completed data analysis on other RNAs; DW and SMG wrote scripts to parse and analyze miRNAs motifs; MF constructed the searchable database and web server; MDD conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Additional file

Additional file 1: Figure S1.

Analysis of the closing base pairs for 1-nucleotide bulges, both 5ā€™ and 3ā€™, in highly expressed human RNAs with known structures. As observed for 5ā€™ and 3ā€™ bulges in miRNA precursors, each bulge has preferred 5ā€™ and 3ā€™ closing base pairs. Further, the distribution of closing base pairs is different for miRNA precursors and other human RNAs (Fig.Ā 5). (PDF 319Ā kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Childs-Disney, J.L., Znosko, B.M. et al. Analysis of secondary structural elementsĀ in human microRNA hairpin precursors. BMC Bioinformatics 17, 112 (2016). https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-016-0960-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12859-016-0960-6

Keywords