Genomic Identification of Structural RNAs using phylo-SCFGs

Tuesday, October 30, 2007 - 5:35pm - 5:50pm
EE/CS 3-180
Jakob Pedersen (University of Copenhagen)
RNA structures often evolve with characteristic substitution patterns
that preserve base-pairs in spite of changes in primary sequence. With
the advent of closely related full-length genomes, it has become
possible to exploit this comparative signal for genomic identification
of structural RNAs (1).

Phylo-SCFGs (2) are attractive models for this problem since they can
describe both RNA structure, using stochastic context-free grammars
(SCFGs), and sequence evolution, using phylogenetic models. Using
variations of classical algorithms, multiple alignments with any
number sequences can be handled efficiently.

EvoFold implements this approach and has been used to screen
multiple-sequence genomic-alignments of both vertebrates and
Drosopholids for structural RNAs (1,3). This has resulted in hundreds of
high-confidence novel candidates of both ncRNAs and cis-regulatory

1) Identification and Classification of Conserved RNA Secondary
Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G,
Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and
Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.

2) Using stochastic context free grammars and molecular evolution to
predict RNA secondary structure. Knudsen B and Hein JJ.
Bioinformatics. 1999; 15 (6): 446-454.

3) Discovery of functional elements in 12 Drosophila genomes using
evolutionary signatures. Stark A, Lin MF, Kheradpour P, and
Pedersen JS, et al. 2007 (in press).