Skip to main content

Table 1 Features for machine learning used in this study

From: Improved identification of conserved cassette exons using Bayesian networks

Feature subset

Number of features

Motivation

First use

Exon: length, symmetry, and identity with mouse ortholog

3

Alternative exons tend to be shorter, frame-preserving, and more conserved compared to constitutive exons

[7]

Conservation of intronic flanks: length/identity of the best local and identity of the global alignment

2 × 3

Alternative exons tend to have higher conservation in their intronic flanks

[7, 10]

Conservation in a 12 nucleotide region spanning the 3' and 5'ss

2

As alternative exons and their intronic flanks are more conserved, this may in particular concern the exon/intron boundaries

This work

PPT intensity

1

Alternative exons tend to have weaker PPTs

[8]

Nucleotides at seven positions flanking the 5'ss

4 × 7

Alternative exons tend to have specific nucleotide preferences near the 5'ss

[8]

Frequency of di- and trimers in the exon and flanking introns

3 × 16

3 × 64

Motifs which are part of splice regulatory motifs might differ in their abundance in alternative and constitutive exons

[8] (trimers), this work (dimers)

Splice site strength of 3'and 5'ss

2

Alternative exons tend to have weak splice sites

[10]

Length of flanking introns

2

Alternative exons tend to be flanked by long introns

[10]

GC content of exon and intronic flanks

3

GC-poor regions tend to promote alternative splicing

This work

Features based on NI scores

24

Alternative exons tend to have fewer ESEs and more ESSs

This work

Features based on PU values

15

Single-stranded motifs are likelier to bind to regulators

This work

PTB-binding sites

6

PTB is a regulator alternative splicing

This work

Features based on ISREs

8

Alternative exons tend to have more ISREs in their intronic flanks

This work

Density of various motifs

22

Several motifs are known to be associated with alternative splicing

This work

Combination features

7

Combining features can capture more information

This work

  1. Note that the total number of features used is 365 whereas the sum of the entries here is 378, because some features have been counted in more than one category (for example, in PU value and NI score related features).