<span class=strong>Reception and Poster Session</span>
Monday, October 29, 2007 - 5:00pm - 6:30pm
- A Continuous Probabilistic Model of Local RNA 3-D Structure
Jes Frellsen (University of Copenhagen)
Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)
So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.
- Efficient Algorithms for Probing the RNA Mutation Landscape and Prediction of Deleterious Mutations
Jérôme Waldispühl (Massachusetts Institute of Technology)
We develop an efficient algorithm to compute, for a given RNA sequence and
simultaneously for each k, the minimum free energy structure MFE_k and the
Boltzmann partition function Z_k over all secondary structures of all k-point
mutants of the given sequence. Using the partition function, we rigorously
sample from the ensemble of low energy k-point mutants in order to explore the
mutation landscape. Our algorithm, named RNAmutants, allows us to investigate
deleterious mutations (mutations that radically modify secondary structure) in
the Hepatitis C virus cis-acting replication (HCV CAR) element and the hairpin
of human immunodeficiency virus trans-activation response (HIV-1 TAR) element.
More generally, using RNAmutants, we study the resiliance of an RNA molecule to
pointwise mutations. By computing the mutation profile of a sequence, a novel
graphical representation of the mutational tendency of nucleotide positions, we
analyze the deleterious nature of mutating specific nucleotide positions or
groups of positions. In particular, we show qualitative agreement between
published HIV experimental mutagenesis studies and our analysis of deleterious
mutations using RNAmutants. Our work predicts other deleterious mutations,
which could be verified experimentally.
Work in collaboration with P. Clote, B. Berger and S. Devadas.
- Determining Functional Conformations of Two HDV III Strains
Wojciech (Voytek) Kasprzak (National Cancer Institute)
Joint work with Sarah D. Linnstaedt2, John L.
and Bruce A. Shapiro3.
1Basic Research Program, SAIC-Frederick, Inc., NCI
Frederick, Frederick, MD
2Department of Microbiology and Immunology, Georgetown
Center, Washington, DC
3Center for Cancer Research Nanobiology Program, National
Hepatitis Delta virus (HDV) is a sub-viral human pathogen
B virus (HBV) liver infections. The short HDV genome (~1680
nt) is a single
stranded, circular RNA encoding only one protein, the
hepatitis delta antigen
(HDAg). The host enzyme ADAR1 edits the HDV stop codon
(UAG) into a tryptophan
(W) codon (UGG) enabling expression of the two forms of the
protein, short and
long, from the same open reading frame. HDAg-S is required
while HDAg-L enables viral particle formation and inhibits
balance between the two forms is crucial and editing must
We have applied our programs, MPGAfold and StructureLab, to
predict and examine
the folding coformations/states of an HDV III construct.
This construct includes
the editing site (amber/W) and has the editing capabilities
of the full HDV III.
The predicted secondary structure folding dynamics
indicates that the HDV III
RNA forms a meta-stable branched structure and a stable rod
were observed in vitro, and the branched structure was
identified as the one
enabling editing. Computational predictions and the
experimental data also
indicate that an Ecuadorian strain folds into the
more readily than a Peruvian strain, and we indicate the
reasons for the
difference. Thus the folding dynamics of HDV III strains
appears to strongly
influence their RNA editing levels.
Funded in part by NCI Contract N01-CO-1240.
- MASTR: Simultaneous Multiple Alignment and Structure Prediction of Non-coding RNAs Using Simulated Annealing
Stinus Lindgreen (University of Copenhagen)
Joint work with Anders Krogh, University of Copenhagen and Paul Gardner, Wellcome Trust Sanger Institute.
With the growing interest in non-coding RNAs and their function, there is also a growing need for computational tools that can be used to analyze new sequences and predict their secondary structure. For a single RNA sequence, a common approach is to find the minimum free energy conformation. However, by looking at many related sequences, it is possible to incorporate evolutionary information and perform a comparative analysis which can improve the prediction. Preferably, one should perform the multiple alignment of the sequences and the prediction of their common secondary structure simultaneously. We present a novel heuristic method that uses simulated annealing to iteratively improve the sequence alignment and a common secondary structure. This is done in the context of a sampling approach using fairly simple moves, that either change the sequence alignment by moving gaps around or update the structure on the base pair level. The prediction is evaluated using a cost function that combines the log-likelihood of the alignment, the base pair probabilities and a covariation term. The method, implemented in the C++ program MASTR (Multiple Alignment of STructural RNAs), is competitive to other current programs, both in terms of speed, alignment quality and structure quality.
Lindgreen S, Gardner PP, Krogh A (2007): MASTR: Multiple alignment and structure prediction of non-coding RNAs using simulated annealing, Bioinformatics (accepted)
- Designing Structured RNA Pools for In Vitro Selection of RNAs
Hin Gan (New York University)
In vitro selection of RNAs is a versatile experimental technology
for discovering novel RNA molecules from randon sequence pools.
However, finding complex RNA molecules is difficult because simple
motifs dominate in random pools. Thus, engineering sequence pools
possessing complex structures could increase the probability of
discovering novel RNAs.
The mathematical problem of designing structured RNA pools is to
optimize the sequence/structure space to yield the structural
characteristics of the target pool. We represent experimental pool
generation as nucleotide mixing (transition) matrix applied to a
starting sequence, and pool structures as RNA graphs. These tools
allow us to map regions of RNA sequence space using mixing matrix
and their structural distributions. The target structured pool
corresponds to an optimal combination of mixing matrices,
starting sequences, and associated pool fractions.
We show that our pool design approach allows generation of pools
with user-defined characteristics, such as proportions of specific
target motifs, starting functional sequences, and sequence length.
Our pool design method has been automated and made available through
the webserver RAGPOOLS (http://rubin2.biomath.nyu.edu) that offers a
theoretical companion tool for RNA in vitro selection and related
Thus, RAGPOOLS can serve as a guide to researchers who aim to synthesize
RNA pools with desired properties and/or perform in silico experiments.
Kim N, Shin JS, Elmetwaly S, Gan HH, and Schlick T, RAGPOOLS:
RNA-As-Graph-Pools A web server for assisting the design of structured
RNA pools for i n vitro s election. Bioinformatics 2007 (In Press).
Kim N, Gan HH, Schlick T, A computational proposal for designing
structured RNA pools for in vitro selection of RNAs.RNA 2007,
- Functional Classification of all Non-coding Microbial Sequences through Phylogenetic Profiling
Daniel Gautheret (Université de Paris XI (Paris-Sud))
Joint work with Antonin Marchais and Magali Naville
(IGM. Bât 400 - Université Paris-Sud - 91405 Orsay cedex –
Although comparative genomics has been instrumental in the
identification of novel non-coding RNA (ncRNA) in model
genomes, this technique cannot, in the form it is currently
practised, keep up with the pace of genome sequencing. As a
result, hundreds of microbial genomes, including entire
families of important pathogens, have been left out of the
picture in terms of ncRNA function analysis. As ncRNAs play
major regulatory and adaptive roles in bacteria, there is an
urgent need for innovative computational methods that would
permit a quick and efficient detection of ncRNAs in any genome
of interest. Here we propose a protocol that exploits the depth
of phylogenetic information in all available genomes (with
virtually no limitation in the number of species) to produce a
functional classification of all ncRNA candidates and other
non-coding conserved elements in any target bacterial genome.
Our protocol involves a low-stringency screening for intergenic
conserved elements (ICEs) in the target genome, followed by the
construction of the presence/absence profile of each ICE across
the complete bacterial genome collection. All ICEs are then
clustered according to the distance of their phylogenetic
profiles, as done by Pellegrini et al.  for classifying
protein genes. A simultaneous clustering of ICEs and ORFs
produces a complete classification of coding and non-coding
elements in the target genome. We ran this pipeline on E.
and B. subtilis. In both species, known small RNAs and
riboswitches were significantly concentrated (P~10-6) in two or
three clusters containing as well many orthologous E.
B. subtilis ORFs and ~200 undefined ICEs in each species.
Phylogenetic profile clustering is independent of sequence
similarity and appears to predict functional ncRNAs with a much
higher specificity than comparative sequence analysis.
Furthermore, some clusters that lack known ncRNAs show very
interesting phylogenetic presence/absence patterns that
indicate either horizontal transfers or the emergence of common
adaptive non-coding elements in distant bacterial species.
Finally, the co-occurrence of ICEs and protein coding genes in
the same clusters may constitute an important source of
information on ICE/ORF functional relationships. A complete run
of our ICE classification pipeline on a bacterial genome only
requires a few hours.
 Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates
TO. (1999) Assigning protein functions by comparative genome
analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci.
- RNA Matrices and RNA Secondary Structures
Asamoah Nkwanta (Morgan State University)
Two lower-triangular arrays with entries that count RNA secondary structures of a given length are mentioned in this short talk. The array entries also count specific lattice walks. There is a one-to-one correspondence between RNA structures and a subset of the walks. We will discuss various ways in which the walks can be used as a tool to help predict primary RNA sequences.
- SwS: A Solvation Web Service for Nucleic Acids
Modeling accurately the solvation of nucleic acid systems is an important
issue since it has been shown that water, together with the surrounding
ionic atmosphere, is an essential component of RNA and DNA structure. A new
web service, called SwS (Solvation web Service for nucleic acids), will be
presented. This web service, based on the nucleic acid structures contained
in the NDB, is devoted to the statistical analysis of the first solvation
shell of important structural fragments and has been developed to allow
accurate comparisons between theoretical (molecular dynamics simulations)
and experimental (x-ray) data and to better understand molecular recognition
phenomena involving water and ions. Such data will also, at a more subtle
level, improve our views on assembly rules of tertiary structural motifs of
- On the Design of Oligos for Gene Synthesis
Chris Thachuk (University of British Columbia)
Methods for reliable synthesis of long genes offer great promise for
protein synthesis via expression of synthetic genes, with applications
to improved analysis of protein structure and function, as well as
engineering of novel proteins. Current technologies for gene
synthesis use computational methods for design of short oligos, which
can then be reliably synthesized and assembled into the desired target
gene. For collision-oblivious oligo design -- when mishybridizations
between oligos are ignored -- we give a simple and efficient dynamic
programming algorithm. We conjecture that the collision-aware oligo
design problem is NP-hard and provide evidence that mishybridizations
between oligos occur infrequently in the designs from the
collision-oblivious algorithm. We extend our dynamic programing
algorithm to achieve collision-aware oligo design, when the target
gene can be partitioned into independently-assembled short segments.
We evaluate our methods on a large biological gene set.
- RNA Pseudoknotted Secondary Structure Prediction using Hierarchical Folding
Hosna Jabbari (University of British Columbia)
Improving the accuracy and efficiency of computational RNA
structure prediction is an important challenge, particularly
pseudoknotted secondary structures. We propose a new approach
prediction of pseudoknotted structures, motivated by the
that RNA structures fold hierarchically, with pseudoknot free
forming initially, and pseudoknots forming later so as to
energy relative to the initial pseudoknot free structure. Our
(Hierarchical Fold) algorithm has O(n3) running time, and
a wide range of biological structures, including nested
hairpins, which have previously required O(n6) time using
minimum free energy approaches. We also report on an
evaluation of HFold.
- Binding of Aminoglycosidic Antibiotics to the Oligonucleotide A-site Model
Maciej Dlugosz (University of Warsaw)
Coauthors J. M. Antosiewicz and J. Trylska.
Aminoglycosidic antibiotics are anti-bacterial molecules which target
the A-site of the small ribosomal subunit. Using Brownian dynamics we
simulated the encounter of four different aminoglycosidic antibiotics
with their RNA binding site on the ribosome. The considered
antibiotics include neamine, neomycin, paromomycin and
ribostamycin. They are amine sugar derivatives, composed of 2 to 4
rings, with a positive total charge of +4 to +6e. The influence of
structural, electrostatic and hydrodynamic properties of antibiotics
on the kinetics of their association with the ribosomal A-site is
discussed. Diffusion limited rates of association are computed and
their dependence on ionic strength of the surrounding is examined. The
mechanism of diffusion towards the RNA and the formation of the
encounter complex is analyzed.
- Finding Additional Functional Elements in Essential RNA Sites: Not Conserved, but Not Unimportant
Rob Knight (University of Colorado)
Joint work with Vikas Malaiya, Jana Chocholousova, Matthew Iyer, Irene Majerfeld, and Michael Yarus.
Evolutionary conservation has often been used to recover the essential pieces of RNA sites, yet can only reveal elements that are necessary, rather than sufficient, for function. Biochemical studies in several systems, including the hammerhead ribozyme and the purine riboswitch, indicate additional regions, such as loop-loop interactions, that are required for function yet are not phylogenetically conserved. Here we use a minimal motif for binding the amino acid tryptophan to ask the ultimate question of an RNA motif: do we know the essential elements well enough to embed the motif in a random-sequence background and obtain functional molecules? We show the utility of this technique for discovering additional sequence requirements for the motif, in this case the requirement for an unpaired G in a specific range of locations and structures relative to the main loop identified by SELEX, and discuss its implications for calculating the probability of obtaining functional RNAs from random-sequence pools.
- Feyman Diagrams, RNA Folding, and the Transition Polynomial
Feynman diagrams were introduced by physicists. They arise naturally
in mathematics (from knots and singular knots), and in molecular
biology (from RNA folding). In particular, work of G. Vernizzi, H.
Orland, and A. Zee
has shown that the genus of Feynman diagrams plays an important role
in the prediction of RNA structures.
The transition polynomial for 4-regular graphs was defined by Jaeger to
unify polynomials given by vertex reconfigurations similar to the
skein relations of knots. It is closely related to the Kauffman bracket,
Tutte polynomial, and the Penrose polynomial.
We define a transition polynomial for Feynman diagrams and discuss its
properties. In particular, we show that the genus of a Feynman
diagram is encoded in the transition polynomial. This is joint work
with Kerry Luse.
- The Rfam Database: We Need You
Alex Bateman (Wellcome Trust Sanger Institute)Paul Gardner (Wellcome Trust Sanger Institute)
The Rfam database is a collection of multiple sequence alignments and
covariance models representing many common non-coding RNA gene (ncRNA)
Rfam aims to facilitate the identification and classification of new
members of known sequence families, and distributes annotation of
ncRNAs in over 200 complete genome sequences. Rfam release 8.0
contains 574 ncRNA families (including 427 bona fide RNA genes, and
145 regulatory elements).
For each family we provide predicted secondary structures, multiple
sequence alignments, species distribution, annotation and links to
other external specialised resources.
All our data is available and searchable online or for download and
- Genomic Identification of Structural RNAs using phylo-SCFGs
Jakob Pedersen (University of Copenhagen)
RNA structures often evolve with characteristic substitution patterns
that preserve base-pairs in spite of changes in primary sequence. With
the advent of closely related full-length genomes, it has become
possible to exploit this comparative signal for genomic identification
of structural RNAs (1).
Phylo-SCFGs (2) are attractive models for this problem since they can
describe both RNA structure, using stochastic context-free grammars
(SCFGs), and sequence evolution, using phylogenetic models. Using
variations of classical algorithms, multiple alignments with any
number sequences can be handled efficiently.
EvoFold implements this approach and has been used to screen
multiple-sequence genomic-alignments of both vertebrates and
Drosopholids for structural RNAs (1,3). This has resulted in hundreds of
high-confidence novel candidates of both ncRNAs and cis-regulatory
1) Identification and Classification of Conserved RNA Secondary
Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G,
Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and
Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.
2) Using stochastic context free grammars and molecular evolution to
predict RNA secondary structure. Knudsen B and Hein JJ.
Bioinformatics. 1999; 15 (6): 446-454.
3) Discovery of functional elements in 12 Drosophila genomes using
evolutionary signatures. Stark A, Lin MF, Kheradpour P, and
Pedersen JS, et al. 2007 (in press).
- RNA Folding during Transcription Facilitated by Non-native Structures
Tao Pan (University of Chicago)
RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.
- Comparative Genomics Beyond Sequence based Alignments: RNA Structures in
the ENCODE Regions
Elfar Torarinsson (University of Copenhagen)
Joint work with Z. Yao, E. D. Wiklund, J. B. Bramsen , C. Hansen, J. Kjems,
N. Tommerup, W. L. Ruzzo, and J. Gorodkin.
Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms
have relied on existing multiple sequence alignments. However, as sequence
similarity drops, a key signal of RNA structure - frequent compensating base
changes - is increasingly likely to cause sequence-based alignment methods to
misalign, or even refuse to align, homologous ncRNAs, consequently obscuring
that structural signal. We have used CMfinder , a structure-oriented local
alignment tool, to search vertebrate multiple alignments in the ENCODE
regions. In agreement with other studies , we find a large number of
potential RNA structures in the ENCODE regions. We report 6,587 candidates with
an estimated false positive rate of 50%. More intriguingly, many of these
candidates may be better represented by alignments taking the RNA secondary
structure into account than those based on primary sequence alone, often quite
dramatically. For example, approximately one quarter of these 6,587 candidates
show revisions in more than 50% of their aligned positions. Furthermore, our
results are strongly complementary to those discovered by
sequence-alignment-based approachesâ€”84% of our candidates are not covered by
Washietl et al., increasing the number of ncRNA candidates in the ENCODE
region by 32%. In a group of eleven ncRNA candidates that were tested by
RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue. Our
results broadly suggest caution in any analysis relying on multiple sequence
alignments in less well-conserved regions, clearly support growing appreciation
for the biological significance of ncRNAs, and strongly argue for considering
RNA structure directly in any searches for these elements.
1. Yao, Z., Weinberg, Z. and Ruzzo, W.L. 2006. CMfinder - A Covariance Model
Based RNA Motif Finding Algorithm. Bioinformatics 22: 445-452.
2. Washietl, S., Pedersen, J.S., Korbel, J.O., Gruber, A.R., Hackermuller, J.,
Hertel, J., Lindemeyer, M., Reiche, K., Stocsits, C., Tanzer, A., et
al. 2007. Structured RNAs in the ENCODE Selected Regions of the Human
Genome. Genome Research 17: 852-864.
- Utilizing the RNAJunction Database for the Design of RNA Nanostructures
Eckart Bindewald (SAIC-Frederick, Inc.)
Joint work with Wojciech Kasprzak1, Mary O’Connor2, Brett Boyle2 and Bruce A. Shapiro2.
1 Basic Research Program, SAIC-Frederick, Inc., NCI Frederick, Frederick, Maryland, USA
2 Center for Cancer Research Nanobiology Program, NCI Frederick, Frederick, Maryland, USA.
We are presenting RNAJunction, which is a database containing extracted and annotated 3D coordinate data of RNA junctions, kissing loops, internal loops and bulges. The database contains more than 12000 structural elements and allows web-based querying by sequence, type and PDB information. The database allows searching by geometric constraints (inter-helix angles); this is useful for the design of RNA nanostructures.
We show how these structural elements can be utilized to generate ring structures and other complexes using the NanoTiler software. We present five different approaches for assembling RNA complexes from building blocks. Several examples of automatically generated computational RNA models are presented.
Funded in part by DHHS #N01-CO-12400.
- Multi-scale Simulation of RNA Catalytic Activity
George Giambasu (University of Minnesota, Twin Cities)
Joint work with Taisung Lee and Darrin M. York (Department of Chemistry, University of Minnesota).
We present a series of multi-scale simulation studies
on RNA catalysis. The results of several series of molecular
dynamics (MD) and QM/MM simulations on of the full-length
hammerhead ribozyme and the L1 Ligase ribozyme are presented.
For the hammerhead ribozyme we have used simulations to
investigate the role of metal ions and the possible solvent
structure in the crystal, and study/predict the mutation
effects at the C3 and G8 sites. For the L1 Ligase we have
studied the details of a major conformational change prior to
the reaction and possible conformations of the ligation site in
the reactant state.These simulations (each with a length of 50
to 100 ns, with a total of more than 1.5 ms) are at least one
to two orders longer than any previous reported simulations and
significant amount of unrevealed insights have been found
through our simulations.
- Eﬃcient Algorithms for Pobing the RNA Mutation Landscape (with Waldispuehl, Devadas, Berger)
Peter Clote (Boston College)
The diversity and importance of the role played by RNAs in the regulation and development of the cell
has now been demonstrated. This broad range of functions is achieved through speciﬁc structures which
have been (presumably) optimized through evolution. The existence of a well-founded energy function
for RNA has enabled accurate ab-initio secondary structure prediction. State-of-the-art methods such as
McCaskill, use a statistical mechanics framework based on the computation of the partition function over
the canonical ensemble of all possible secondary structures on a given sequence. Unfortunately, these
techniques do not permit any modiﬁcation of the input sequence during their execution and thus cannot
investigate the mutation landscape of this sequence.
- Annotated Tertiary Interaction Motifs in RNA Structures
Christian Laing (New York University)
RNA tertiary motifs play an important role in RNA folding. To understand the
complex organization of RNA tertiary interactions, we compiled a dataset
containing 54 high-resolution RNA crystal structures. Seven RNA tertiary
motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin,
tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by
different computer programs. For the non-redundant RNA dataset, 605 RNA
tertiary interactions were found. Most of these 3D interactions occur in the
16S and 23S rRNAs. Exhaustive search of these motifs reveals diversity of
interaction. Correlation between motifs (e.g. pseudoknot or coaxial helix
with A-minor) shows that they can form composite motifs. These findings
may lead to tertiary structure constraints useful for RNA 3D prediction.
- Collective Properties of Evolving Populations of RNA Molecules
Michael Stich (Instituto Nacional de Tecnica Aeroespacial)
RNA molecules, through their dual appearance as sequence and
structure, represent a suitable model to study evolutionary properties
of quasispecies. The essential ingredient in this model is the
differentiation between genotype (molecular sequences which are
affected by mutation) and phenotype (molecular structure, affected by
selection). This framework allows a quantitative analysis of
organizational properties of quasispecies as they adapt to different
environments, such as their robustness, the effect of the degeneration
of the sequence space, or the adaptation under different mutation
rates and the error threshold associated.
- Computational Models for RNA Silencing Pathways under Time-dependent Transgene Transcription Rates
Roderick Melnik (Wilfrid Laurier University)
Joint work with Jack Yang and Roy Mahapatra.
The synthesis of dsRNA is analyzed using a pathway model with amplifications caused by the aberrant RNAs. The transgene influx rates are assumed time-decaying and Gaussian functions of time. The dynamics of the transgene induced RNA silencing is investigated with a system of coupled non-autonomous nonlinear differential equations describing the process phenomenologically. The silencing phenomena are detected after a period of transcription. Important contributions of several parameters, including those leading to bifurcation patterns, are discussed with a series of numerical examples.
- Locomotif: From Graphical Motif Description to RNA Motif Search
Jens Reeder (Universität Bielefeld)
Motivated by the recent rise of interest in small regulatory
RNAs, we present Locomotif - a new approach for locating RNA
motifs that goes beyond the previous ones in three ways: (1)
Motif search is based on efficient dynamic programming
algorithms, incorporating the established thermodynamic model
of RNA secondary structure formation. (2) Motifs are described
graphically, using a Java-based editor, and search algorithms
are derived from the graphics in a fully automatic way. The
editor allows us to draw secondary structures, annotated with
size and sequence information. They closely resemble the
established, but informal way in which RNA motifs are
communicated in the literature. Thus, the learning effort for
Locomotif users is minimal. (3) Locomotif employs a
client-server approach. Motifs are designed by the user
locally. Search programs are generated and compiled on a
bioinformatics server. They are made available both for
execution on the server, and for download as C source code plus
an appropriate make-file.
Availability: Locomotif is available at href=http://bibiserv.techfak.uni-bielefeld.de/locomotif>http://bibiserv.techfak.uni-bielefeld.de/locomotif.
- Local Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix
Jan Gorodkin (University of Copenhagen)
Joint with Jakob H. Havgaard and Elfar Torarinsson (Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Frederiksberg, Denmark).
The Sankoff algorithm for simultaneously folding and aligning
sequences is computationally very heavy. Recently a number of
groups have applied various constraints to lower the
computational requirements to reasonable levels. Whereas the
original Sankoff algorithm as well as many of the
implementations, only conduct global alignments, the FOLDALIGN
implementation makes both local and global structural
The most recent version of FOLDALIGN introduces pruning of the
dynamical programming matrix as a simple and effective
which lowers the time and memory requirements significantly
without lowering the predictive performance. FOLDALIGN is
currently one of few Sankoff alogorithms capable of conductiong
local alignments while being a practical tool. It has also been
used in genome-wide screen for putative RNA structures in
corresponding, but unaligned regions between human and mouse.
addition to the pairwise version of FOLDALIGN we have also made
multiple alignment method which either takes the pairwise
alignments or McCaskill basepair probability matrices as input.
- Fast pairwise structural RNA alignments by pruning of the
dynamical programming matrix. J. H. Havgaard, E. Torarinsson
J. Gorodkin PLoS Computational Biology, in press
- Multiple structural alignment and clustering of RNA
E. Torarinsson, J. H. Havgaard and J. Gorodkin Bioinformatics,
- Thousands of corresponding human and mouse genomic regions
unalignable in primary sequence contain common RNA structure.
Torarinsson, M. Sawera, J. H. Havgaard, M. Fredholm and J.
Gorodkin Genome Research, 16:885-889, 2006.
- Efficient Parameter Estimation for RNA Secondary Structure Prediction
Mirela Andronescu (University of British Columbia)
Joint work with Anne Condon, Holger H. Hoos, David H. Mathews, and
Kevin P. Murphy.
Motivation: Accurate prediction of RNA secondary structure from the
base sequence is an unsolved computational challenge. The accuracy of
predictions made by free energy minimization is limited by the
quality of the energy parameters in the underlying free energy model.
The most widely used model, the Turner99 model, has hundreds of
parameters, and so a robust parameter estimation scheme should
efficiently handle large data sets with thousands of structures.
Moreover, the estimation scheme should also be trained using available
experimental free energy data in addition to structural data.
Results: In this work, we present constraint generation (CG), the
first computational approach to RNA free energy parameter estimation
that can be efficiently trained on large sets of structural as well as
thermodynamic data. Our constraint generation approach employs a novel
iterative scheme, whereby the energy values are first computed as the
solution to a constrained optimization problem. Then the
newly-computed energy parameters are used to update the constraints on
the optimization function, so as to better optimize the energy
parameters in the next iteration. Using our method on biologically
sound data, we obtain revised parameters for the Turner99 energy
model. We show that by using our new parameters, we obtain
significant improvements in prediction accuracy over current
Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and
Kevin P. Murphy, Efficient parameter estimation for RNA secondary
structure prediction, Bioinformatics. 2007 Jul 1;23(13):i19-28.
- Improved RNA Gene Predictions through Dinucleotide Controlled Randomization of Multiple Sequence Alignments
Stefan Washietl (Universität Wien)
Tanja Gesell (1) & Stefan Washietl (2,3)
1. Center for Integrative Bioinformatics, Max Perutz Laboratories, Vienna
2. EMBL-European Bioinformatics Institute, Hinxton United Kingdom
3. Department of Theoretical Chemistry, University of Vienna, Austria
Most noncoding RNA gene prediction programs are based on the detection of conserved RNA secondary structures in multiple alignments . Although this approach seems to be the most promising, the main problem of current algorithms is the large number of false positive predictions, in particular in large vertebrate genomes.
As the available algorithms assume a mononucleotide background model, a major source of erroneously predicted RNA structures is the biased dinucleotide content found e.g. in vertebrate genomes. While there are well known algorithms for randomization of single sequences preserving dinucleotide content, no algorithms exist for multiple alignments. We present a novel algorithm addressing this problem.
Our approach which involves in silico evolution along a phylogenetic tree has two key features: (i) We make use of a new evolutionary model that considers site specific and overlapping dependencies . This enables us to simulate alignments with given dinucleotide (or higher order nucleotide) content. (ii) The model includes site-specific rate factors that preserve critical conservation patterns of the original alignments. We developed a time-efficient distance based approximation method to estimate a tree under this complex model which is used as guide for simulating new alignments.
Based on this improved null model we have implemented a noncoding RNA gene prediction algorithm called SISSIz, that builds upon the RNAalifold and AlifoldZ programs. The new dinucleotide based program shows significantly improved accuracy over its mononucleotide counterparts on a vertebrate test set.
 Washietl S., Hofacker I.L., Lukasser M., Huettenhofer A., Stadler P.F. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. (2005), 23:1383-90.
 Gesell T., von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics. (2006), 22:716-22
- Computational Comparative Genomics for Discovery of Cis-regulatory RNAs in Bacteria
Walter Ruzzo (University of Washington)
Discovery of novel functional noncoding RNA is a multifaceted problem. Motif representation, inference and search are all important, as is incorporation of relevant biological knowledge. With careful attention to all of these, we have developed a comparative genomics pipeline for discovery of cis-regulatory RNA elements in bacteria. We represents motifs using covariance models (CMs), as in the Rfam database. Motif inference in unaligned sequences with extraneous flanking regions (i.e., local alignment) relies on CMfinder . We apply it to intergenic regions upstream of homologous genes in different bacteria, since cis-regulatory elements often are found and conserved there . We use Ravenna  for efficient, sensitive CM search to identify additional instances. This is critical since (i) a given RNA element often regulates multiple genes in a pathway, not just homologs, and (ii) more examples allow us to refine the model (also via CMfinder), in turn enabling further discoveries. This strategy recovers most known RNAs in Firmicutes . More importantly, we discovered 6 new likely riboswitch families, most experimentally verified, plus over 20 other elements in a wide variety of bacteria [3,4]. Collectively, these RNAs are involved in diverse but individually specific cellular processes, such as ribosome biogenesis, molybdenum cofactor biosynthesis and the citric acid cycle. One of the more surprising finds is a widespread riboswitch that apparently regulates such disparate processes as natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered. Many computational challenges also remain.
 Weinberg and Ruzzo. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 2006, 22(1):35-39
 Yao, Weinberg and Ruzzo. CMfinder--A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics, 2006, 22(4): 445-452.
 Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. A Computational Pipeline for High Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes. PLoS Computational Biology. 3(7): e126, July 6, 2007.
 Weinberg, Barrick, Yao, Roth, Kim, Gore, Wang, Lee, Block, Sudarsan, Neph, Tompa, Ruzzo and Breaker. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucl. Acids Res., July 2007 35: 4809-4819.
- Riboswitches, RNA Conformational Switches and Prokaryotic Gene Regulation (with Eva Freyhulta Vincent Moultonb)
Peter Clote (Boston College)
Linnaeus Centre for Bioinformatics, Uppsala University, 75124 Uppsala,
b School of Computing Sciences, University of East Anglia, Norwich, NR4
7TJ, UK, firstname.lastname@example.org,
c Department of Biology, Boston College, Chestnut Hill, MA 02467, USA,
email@example.com. This work is funded in part by NSF DBI-0543506.
Metabolite-sensing 5 -UTR (untranlated regions) of certain mRNAs, called
riboswitches, have been discovered to undergo a conformational change upon
ligand-binding, which thereby can up- or down-regulate the corresponding
protein product. For instance, upon the binding of nucleotide guanine, the
G-box riboswitch in the 5 UTR of the XPT gene of Bacillus subtillis un-
dergoes a conformational change to create a terminator loop, thereby pre-
maturely terminating transcription of the XPT gene. Since XPT is involved
in guanine metabolism, this is an example of negative autoregulation by a
riboswitch. Although riboswitches have been postulated to be an ancient
genetic regulatory system, ﬁrst developed in bacteria, the remarkable dis-
covery of Cheah et al. in Nature 2007 suggests that eukaryotes may have
co-opted riboswitches to control alternative splicing of genes.
Here we describe a new algorithm RNAbor (Freyhult, Moulton, Clote
Bioinformatics 2007) which gives information on possible conformational
switches by computing the Boltzmann probability of structural neighbors of
a given RNA secondary structure. A secondary structure T of a given RNA
sequence s is called a δ-neighbor of S if T and S diﬀer by exactly δ base pairs.
RNAbor computes the number (Nδ ), the Boltzmann partition function (Zδ )
and the minimum free energy (MFEδ ) and corresponding structure over the
collection of all δ-neighbors of S. This computation is done simultaneously
for all δ ≤ m, in run time O(m2 n3 ) and memory O(mn2 ), where n is the
sequence length. We apply RNAbor for the detection of possible RNA con-
formational switches, and compare RNAbor with an existent switch detection
method. We also provide examples of how RNAbor can at times improve the
accuracy of secondary structure prediction.
- RNA Dinucleotide Step Parameters
Mauricio Esguerra (Rutgers, The State University Of New Jersey )
We present a first
view of the space of conformations adopted by RNA in the currently best-resolved structure of the large ribosomal subunit using the dinucleotide ‘step’ parameters computed with the 3DNA software. We have
explored how the base-step parameters for the 16 possible nucleotide steps of RNA vary in helical vs. non-helical regions.
- Estimating the Fraction of Non-coding RNAs in Mammalian Transcriptomes
Yurong Xin (New York University)
Recent studies of mammalian transcriptomes have identified numerous RNA
transcripts that do not code for proteins; their identity, however, is
largely unknown. Here we explore an approach based on sequence
randomness patterns to discern different RNA classes. The relative
z-score we use helps identify the known ncRNA class from the genome,
intergene, and intron classes. This leads us to a fractional ncRNA
measure of putative ncRNA datasets which we model as a mixture of
genuine ncRNAs and other transcripts derived from genomic, intergenic
and intronic sequences. We use this model to analyze six representative
datasets, identified by the FANTOM3 project and two computational
approaches based on comparative analysis (RNAz and EvoFold). Our
analysis suggests fewer ncRNAs than estimated by DNA sequencing and
comparative analysis, but the verity of our approach and its prediction
requires more extensive experimental RNA data.