October 29-November 02, 2007
Efficient parameter estimation for RNA secondary structure prediction
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Anne Condon, Holger H. Hoos, David H. Mathews, and
Kevin P. Murphy.
Motivation: Accurate prediction of RNA secondary structure from the
base sequence is an unsolved computational challenge. The accuracy of
predictions made by free energy minimization is limited by the
quality of the energy parameters in the underlying free energy model.
The most widely used model, the Turner99 model, has hundreds of
parameters, and so a robust parameter estimation scheme should
efficiently handle large data sets with thousands of structures.
Moreover, the estimation scheme should also be trained using available
experimental free energy data in addition to structural data.
Results: In this work, we present constraint generation (CG), the
first computational approach to RNA free energy parameter estimation
that can be efficiently trained on large sets of structural as well as
thermodynamic data. Our constraint generation approach employs a novel
iterative scheme, whereby the energy values are first computed as the
solution to a constrained optimization problem. Then the
newly-computed energy parameters are used to update the constraints on
the optimization function, so as to better optimize the energy
parameters in the next iteration. Using our method on biologically
sound data, we obtain revised parameters for the Turner99 energy
model. We show that by using our new parameters, we obtain
significant improvements in prediction accuracy over current
state-of-the-art methods.
Reference:
Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and
Kevin P. Murphy, Efficient parameter estimation for RNA secondary
structure prediction, Bioinformatics. 2007 Jul 1;23(13):i19-28.
The Rfam database: we need you
October 29, 2007 5:00 pm - 6:30 pm
The Rfam database is a collection of multiple sequence alignments and
covariance models representing many common non-coding RNA gene (ncRNA)
families.
Rfam aims to facilitate the identification and classification of new
members of known sequence families, and distributes annotation of
ncRNAs in over 200 complete genome sequences. Rfam release 8.0
contains 574 ncRNA families (including 427 bona fide RNA genes, and
145 regulatory elements).
For each family we provide predicted secondary structures, multiple
sequence alignments, species distribution, annotation and links to
other external specialised resources.
All our data is available and searchable online or for download and
local installation.
Math Matters public lecture: U.S. premier screening of the film
"Achieving the unachievable" with the film's writer/director
November 01, 2007 8:00 pm - 9:15 pm
M.C. Escher is among the most mathematical of artists. In 1956 he challenged the laws of perspective with his graphic Print Gallery, and found himself trapped by an impossible barrier. His uncompleted master-piece quickly became the most puzzling enigma of modern art, for both artists and scientists. Half a century later, mathematician Hendrik Lenstra took everyone by surprise by drawing a fantastic bridge between the intuition of the artist and his own, and completed Escher's work mathematically. This story is presented in the 52 minute film Achieving the Unachievable by documentary filmmaker Jean Bergeron. After the screening, the film's U.S. premier, Bergeron will be available to answer questions.
Read More...
Utilizing the RNAJunction database for the design of RNA nanostructures
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Wojciech Kasprzak1, Mary O’Connor2, Brett Boyle2 and Bruce A. Shapiro2.
1 Basic Research Program, SAIC-Frederick, Inc., NCI Frederick, Frederick, Maryland, USA
2 Center for Cancer Research Nanobiology Program, NCI Frederick, Frederick, Maryland, USA.
We are presenting RNAJunction, which is a database containing extracted and annotated 3D coordinate data of RNA junctions, kissing loops, internal loops and bulges. The database contains more than 12000 structural elements and allows web-based querying by sequence, type and PDB information. The database allows searching by geometric constraints (inter-helix angles); this is useful for the design of RNA nanostructures.
We show how these structural elements can be utilized to generate ring structures and other complexes using the NanoTiler software. We present five different approaches for assembling RNA complexes from building blocks. Several examples of automatically generated computational RNA models are presented.
Funded in part by DHHS #N01-CO-12400.
Ultraconserved nonsense: Pervasive unproductive splicing of SR proteins
associated with exceptionally conserved DNA elements -- a bizarre
prevalent mode of gene regulationOctober 31, 2007 1:45 pm - 2:00 pm
Nonsense-mediated mRNA decay (NMD) is a cellular RNA surveillance system
that recognizes transcripts with premature termination codons and degrades
them. We previously discovered large numbers of natural alternative splice
forms that appear to be targets for NMD, and we speculated that this might
be a mode of gene regulation which we termed RUST (regulated unproductive
splicing and translation). This seems to be confirmed by our finding that
all conserved members of the SR family of splice regulators have an
unproductive alternative mRNA isoform targeted for NMD. Strikingly, the
splice pattern for each is conserved in mouse and always associated with
an ultraconserved or highly-conserved region of ~100 or more nucleotides
of perfect identity between human and mouse. Remarkably, this seems to
have evolved independently in every one of the genes, suggesting that this
is a natural mode of regulation.
Riboswitches, RNA conformational switches and prokaryotic
gene regulation (with Eva Freyhulta Vincent Moultonb)
October 29, 2007 5:00 pm - 6:30 pm
Linnaeus Centre for Bioinformatics, Uppsala University, 75124 Uppsala,
Sweden, eva.freyhult@lcb.uu.se,
b School of Computing Sciences, University of East Anglia, Norwich, NR4
7TJ, UK, vincent.moulton@cmp.uea.ac.uk,
c Department of Biology, Boston College, Chestnut Hill, MA 02467, USA,
clote@bc.edu. This work is funded in part by NSF DBI-0543506.
Metabolite-sensing 5 -UTR (untranlated regions) of certain mRNAs, called
riboswitches, have been discovered to undergo a conformational change upon
ligand-binding, which thereby can up- or down-regulate the corresponding
protein product. For instance, upon the binding of nucleotide guanine, the
G-box riboswitch in the 5 UTR of the XPT gene of Bacillus subtillis un-
dergoes a conformational change to create a terminator loop, thereby pre-
maturely terminating transcription of the XPT gene. Since XPT is involved
in guanine metabolism, this is an example of negative autoregulation by a
riboswitch. Although riboswitches have been postulated to be an ancient
genetic regulatory system, first developed in bacteria, the remarkable dis-
covery of Cheah et al. in Nature 2007 suggests that eukaryotes may have
co-opted riboswitches to control alternative splicing of genes.
Here we describe a new algorithm RNAbor (Freyhult, Moulton, Clote
Bioinformatics 2007) which gives information on possible conformational
switches by computing the Boltzmann probability of structural neighbors of
a given RNA secondary structure. A secondary structure T of a given RNA
sequence s is called a δ-neighbor of S if T and S differ by exactly δ base pairs.
RNAbor computes the number (Nδ ), the Boltzmann partition function (Zδ )
and the minimum free energy (MFEδ ) and corresponding structure over the
collection of all δ-neighbors of S. This computation is done simultaneously
for all δ ≤ m, in run time O(m2 n3 ) and memory O(mn2 ), where n is the
sequence length. We apply RNAbor for the detection of possible RNA con-
formational switches, and compare RNAbor with an existent switch detection
method. We also provide examples of how RNAbor can at times improve the
accuracy of secondary structure prediction.
Efficient algorithms for probing the RNA mutation
landscape (with Waldispuehl, Devadas, Berger)
October 29, 2007 5:00 pm - 6:30 pm
The diversity and importance of the role played by RNAs in the regulation and development of the cell
has now been demonstrated. This broad range of functions is achieved through specific structures which
have been (presumably) optimized through evolution. The existence of a well-founded energy function
for RNA has enabled accurate ab-initio secondary structure prediction. State-of-the-art methods such as
McCaskill, use a statistical mechanics framework based on the computation of the partition function over
the canonical ensemble of all possible secondary structures on a given sequence. Unfortunately, these
techniques do not permit any modification of the input sequence during their execution and thus cannot
investigate the mutation landscape of this sequence.
Binding of aminoglycosidic antibiotics to the oligonucleotide A-site model October 29, 2007 5:00 pm - 6:30 pm
Coauthors J. M. Antosiewicz and J. Trylska.
Aminoglycosidic antibiotics are anti-bacterial molecules which target
the A-site of the small ribosomal subunit. Using Brownian dynamics we
simulated the encounter of four different aminoglycosidic antibiotics
with their RNA binding site on the ribosome. The considered
antibiotics include neamine, neomycin, paromomycin and
ribostamycin. They are amine sugar derivatives, composed of 2 to 4
rings, with a positive total charge of +4 to +6e. The influence of
structural, electrostatic and hydrodynamic properties of antibiotics
on the kinetics of their association with the ribosomal A-site is
discussed. Diffusion limited rates of association are computed and
their dependence on ionic strength of the surrounding is examined. The
mechanism of diffusion towards the RNA and the formation of the
encounter complex is analyzed.
RNA dinucleotide step parameters
October 29, 2007 5:00 pm - 6:30 pm
We present a first
view of the space of conformations adopted by RNA in the currently best-resolved structure of the large ribosomal subunit using the dinucleotide ‘step’ parameters computed with the 3DNA software. We have
explored how the base-step parameters for the 16 possible nucleotide steps of RNA vary in helical vs. non-helical regions.
A continuous probabilistic model of local RNA 3-D structure
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)
So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.
A continuous probabilistic model of local RNA 3-D structureOctober 29, 2007 3:05 pm - 3:20 pm
Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)
So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.
Designing structured RNA Pools for RNA in vitro selectionOctober 30, 2007 6:15 pm - 6:30 pm
In vitro selection is a versatile experimental tool for discovering novel
synthetic RNAs. However, most RNAs identified from random sequence pools
are small and simple folding motifs. To significantly increase the
probability of discovering novel RNAs, we develop an approach for
engineering sequence pools that links RNA sequence space regions with
corresponding structural distributions via a "mixing matrix" approach
combined with a graph theory analysis of RNA 2D structure space. Our pool
design method has been automated and made available through the web server
RAGPOOLS (http://rubin2.biomath.nyu.edu). The RAGPOOLS serves as a guide
to researchers who aim to synthesize RNA pools with desired properties
and/or experiment in silico with various designs by our approach.
Designing structured RNA pools for in vitro selection of RNAs
October 29, 2007 5:00 pm - 6:30 pm
In vitro selection of RNAs is a versatile experimental technology
for discovering novel RNA molecules from randon sequence pools.
However, finding complex RNA molecules is difficult because simple
motifs dominate in random pools. Thus, engineering sequence pools
possessing complex structures could increase the probability of
discovering novel RNAs.
The mathematical problem of designing structured RNA pools is to
optimize the sequence/structure space to yield the structural
characteristics of the target pool. We represent experimental pool
generation as nucleotide mixing (transition) matrix applied to a
starting sequence, and pool structures as RNA graphs. These tools
allow us to map regions of RNA sequence space using mixing matrix
and their structural distributions. The target structured pool
corresponds to an optimal combination of mixing matrices,
starting sequences, and associated pool fractions.
We show that our pool design approach allows generation of pools
with user-defined characteristics, such as proportions of specific
target motifs, starting functional sequences, and sequence length.
Our pool design method has been automated and made available through
the webserver RAGPOOLS (http://rubin2.biomath.nyu.edu) that offers a
theoretical companion tool for RNA in vitro selection and related
problems.
Thus, RAGPOOLS can serve as a guide to researchers who aim to synthesize
RNA pools with desired properties and/or perform in silico experiments.
References:
Kim N, Shin JS, Elmetwaly S, Gan HH, and Schlick T, RAGPOOLS:
RNA-As-Graph-Pools A web server for assisting the design of structured
RNA pools for i n vitro s election. Bioinformatics 2007 (In Press).
Kim N, Gan HH, Schlick T, A computational proposal for designing
structured RNA pools for in vitro selection of RNAs.RNA 2007,
13(4):478-92.
Functional classification of all non-coding microbial sequences through phylogenetic profiling
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Antonin Marchais and Magali Naville
(IGM. Bât 400 - Université Paris-Sud - 91405 Orsay cedex –
France).
Although comparative genomics has been instrumental in the
identification of novel non-coding RNA (ncRNA) in model
genomes, this technique cannot, in the form it is currently
practised, keep up with the pace of genome sequencing. As a
result, hundreds of microbial genomes, including entire
families of important pathogens, have been left out of the
picture in terms of ncRNA function analysis. As ncRNAs play
major regulatory and adaptive roles in bacteria, there is an
urgent need for innovative computational methods that would
permit a quick and efficient detection of ncRNAs in any genome
of interest. Here we propose a protocol that exploits the depth
of phylogenetic information in all available genomes (with
virtually no limitation in the number of species) to produce a
functional classification of all ncRNA candidates and other
non-coding conserved elements in any target bacterial genome.
Our protocol involves a low-stringency screening for intergenic
conserved elements (ICEs) in the target genome, followed by the
construction of the presence/absence profile of each ICE across
the complete bacterial genome collection. All ICEs are then
clustered according to the distance of their phylogenetic
profiles, as done by Pellegrini et al. [1] for classifying
protein genes. A simultaneous clustering of ICEs and ORFs
produces a complete classification of coding and non-coding
elements in the target genome. We ran this pipeline on E.
coli
and B. subtilis. In both species, known small RNAs and
riboswitches were significantly concentrated (P~10-6) in two or
three clusters containing as well many orthologous E.
coliand
B. subtilis ORFs and ~200 undefined ICEs in each species.
Phylogenetic profile clustering is independent of sequence
similarity and appears to predict functional ncRNAs with a much
higher specificity than comparative sequence analysis.
Furthermore, some clusters that lack known ncRNAs show very
interesting phylogenetic presence/absence patterns that
indicate either horizontal transfers or the emergence of common
adaptive non-coding elements in distant bacterial species.
Finally, the co-occurrence of ICEs and protein coding genes in
the same clusters may constitute an important source of
information on ICE/ORF functional relationships. A complete run
of our ICE classification pipeline on a bacterial genome only
requires a few hours.
[1] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates
TO. (1999) Assigning protein functions by comparative genome
analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci.
U.S.A. 96:4285-4288
Multi scale simulation of RNA catalytic activity
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Taisung Lee and Darrin M. York (Department of Chemistry, University of Minnesota).
We present a series of multi-scale simulation studies
on RNA catalysis. The results of several series of molecular
dynamics (MD) and QM/MM simulations on of the full-length
hammerhead ribozyme and the L1 Ligase ribozyme are presented.
For the hammerhead ribozyme we have used simulations to
investigate the role of metal ions and the possible solvent
structure in the crystal, and study/predict the mutation
effects at the C3 and G8 sites. For the L1 Ligase we have
studied the details of a major conformational change prior to
the reaction and possible conformations of the ligation site in
the reactant state.These simulations (each with a length of 50
to 100 ns, with a total of more than 1.5 ms) are at least one
to two orders longer than any previous reported simulations and
significant amount of unrevealed insights have been found
through our simulations.
Local pairwise structural RNA alignments by pruning of the dynamical
programming matrixOctober 29, 2007 5:00 pm - 6:30 pm
Joint with Jakob H. Havgaard and Elfar Torarinsson (Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Frederiksberg, Denmark).
The Sankoff algorithm for simultaneously folding and aligning
RNA
sequences is computationally very heavy. Recently a number of
groups have applied various constraints to lower the
computational requirements to reasonable levels. Whereas the
original Sankoff algorithm as well as many of the
implementations, only conduct global alignments, the FOLDALIGN
implementation makes both local and global structural
alignment.
The most recent version of FOLDALIGN introduces pruning of the
dynamical programming matrix as a simple and effective
heuristic
which lowers the time and memory requirements significantly
without lowering the predictive performance. FOLDALIGN is
currently one of few Sankoff alogorithms capable of conductiong
local alignments while being a practical tool. It has also been
used in genome-wide screen for putative RNA structures in
corresponding, but unaligned regions between human and mouse.
In
addition to the pairwise version of FOLDALIGN we have also made
a
multiple alignment method which either takes the pairwise
alignments or McCaskill basepair probability matrices as input.
References
- Fast pairwise structural RNA alignments by pruning of the
dynamical programming matrix. J. H. Havgaard, E. Torarinsson
and
J. Gorodkin PLoS Computational Biology, in press
- Multiple structural alignment and clustering of RNA
sequences.
E. Torarinsson, J. H. Havgaard and J. Gorodkin Bioinformatics,
23:926-932, 2007.
- Thousands of corresponding human and mouse genomic regions
unalignable in primary sequence contain common RNA structure.
E.
Torarinsson, M. Sawera, J. H. Havgaard, M. Fredholm and J.
Gorodkin Genome Research, 16:885-889, 2006.
Analysis, prediction, and design of viral RNA secondary structuresOctober 30, 2007 2:30 pm - 3:00 pm
Understanding how biological sequences encode structural and functional
information is a fundamental scientific challenge. For RNA viral genomes,
the information encoded in the sequence extends well-beyond their protein
coding role to the role of intra-sequence base pairing in viral packaging,
replication, and gene expression. Working with the Pariacoto virus as a
model sequence, we investigate the compatibility of predicted base pairings
with the dodecahedral cage known from crystallographic studies.
To build a putative secondary structure, we first analyze different
possible configurations using a combinatorial model of RNA folding.
We give results on the trade-offs among types of loop structures,
the asymptotic degree of branching in typical configurations, and
the characteristics of stems in "well-determined" substructures.
These mathematical results yield insights into the interaction of
local and global constraints in RNA secondary structures, and suggest
new directions in understanding the folding of RNA viral genomes.
Prediction of RNA-RNA interactionsOctober 30, 2007 4:50 pm - 5:20 pm
Most noncoding RNAs exert their function by interacting with other RNA
molecules, as in the case of microRNA and their mRNA targets. Recent
bioinformatical as well as experimental approaches produce thousands of
novel RNA transcripts most of which cannot be annotated. Our best hope for
elucidating the function of these novel transcripts is through
identification of their interaction partners.
In this talk I will review several approaches to predict the structure of
two RNAs upon hybridization. I will present a fast method to compute
the probability that some region of an RNA is unpaired and thus accessible
for inter-molecular interactions (or, equivalently, the free energy needed to
open up the site), as well as a method to quickly search for possible
hybridization sites. The combination of these two approaches yields a
promising approach to identify the RNA interaction partners of noncoding
RNAs. The site accessibility is also a potent predictor of siRNA
efficacy and can be used to improve microRNA target predictions.
RNA / RNP synthetic biology November 01, 2007 5:30 pm - 6:00 pm
In general, molecular design of RNA is difficult at the 3D level because of its highly complicated folding process. In the 1990s, biochemical and structural analyses revealed that many functional noncoding natural RNAs are organized into modules and fold into defined 3D structures. Moreover, several commonly used RNA–RNA binding motifs in these RNAs were identified by phylogenetic comparison and high-resolution structural analyses. Consequently, it has become possible to design self-folding RNAs precisely by employing such motifs and mimicking the modular organization of natural RNAs. As one such example, we have investigated the design and construction of a self-folding RNA scaffold consisting of standard doublestranded helices connected by the two RNA–RNA binding motifs. Results indicated that the constructed RNA folds compactly into the designed 3D structure. We have also reported the synthesis and development of an artificial RNA enzyme by installing a reaction site and a catalytic site into the designed RNA scaffold. For medical and biological applications, the goals of our current project are 1) to establish multifunctional RNP molecules with tumor seeking sensors, imaging agents and toxins that kill target cells, and 2) to establish artificial signal transduction systems for regulating function of a cell by employing designed RNA and RNP molecules. The strategy may be applicable to the synthesis and development of a variety of nonnatural functional RNAs with defined 3D structures.
RNA pseudoknotted secondary structure prediction using
hierarchical folding
October 29, 2007 5:00 pm - 6:30 pm
Improving the accuracy and efficiency of computational RNA
secondary
structure prediction is an important challenge, particularly
for
pseudoknotted secondary structures. We propose a new approach
for
prediction of pseudoknotted structures, motivated by the
hypothesis
that RNA structures fold hierarchically, with pseudoknot free
pairs
forming initially, and pseudoknots forming later so as to
minimize
energy relative to the initial pseudoknot free structure. Our
HFold
(Hierarchical Fold) algorithm has O(n3) running time, and
can handle
a wide range of biological structures, including nested
kissing
hairpins, which have previously required O(n6) time using
traditional
minimum free energy approaches. We also report on an
experimental
evaluation of HFold.
RNA tertiary structure as a proto-language for nano-constructionNovember 02, 2007 11:35 am - 12:05 pm
Common occurrence of many small structural motifs in natural RNA molecules suggests that nature utilize s a vocabulary of sequence patterns to compose structural molecules with sophisticated topologies such as the ribosome and large ribozymes. By careful analysis of sequences and tertiary structures of natural RNAs, 3D RNA modules and their folding and assembly principles are presently gathered for generating the syntax of a proto-language for rational design and prediction of RNA 3D shapes. RNA architectonics refers to the creation of this proto-language and to its use to build new RNAs with self-assembly properties. Recently, RNA architectonics led to the reliable prediction and design of the tertiary structure of several artificial RNA building blocks able to form programmable filaments and 2D RNA arrays at the nano-scale level. As a proof of concept, we also demonstrated that structurally complex RNAs based on a syntax involving a repertoire of several different RNA motifs can self-assemble into complex supra-molecular 3D nano-particles. This studies show that RNA architectonics can be used as a tool to explore and compare the biophysical properties of various RNA tertiary structure motifs that would be otherwise more difficult to investigate in isolation or within their natural context. It also demonstrates that RNA is an ideal medium for sculpting addressable and responsive self-assembling architectures of any desired shapes in the 20 to 50 nm scale. Moreover, it suggests that RNA supra-molecular assembly can potentially lead to the development of highly sophisticated therapeutic nano-devices for biological and medical applications.
References
1. Jaeger, L. & Chworos, A. (2006) The Architectonics of Programmable RNA and DNA Nanostructures. Current Opinion in Structural Biology, 16, 531-543.
2. Chworos, A, Severcan, I., Koyfman, A. Y., Wienkam, P., Oroudjev, E., Hansma, H. G. & Jaeger, L. (2004). Building programmable jigsaw puzzles with RNA. Science 306, 2068-2072.
3. Nasalean, L., Baudrey, S., Leontis, N.B. & Jaeger, L. (2006) Controlling RNA self-assembly to form filaments. Nucleic Acids Res. 34, 1381-1392
4. Bates, A.D., Callen B.P., Cooper J.M., Cosstick, R., Geary, C., Glidle, A., Jaeger, L., Pearson, J.L., Proupín-Pérez, M., Xu, C., & Cumming, D.R. S. (2006) Construction and characterization of a gold nanoparticle wire assembled using Mg2+-dependent RNA-RNA interactions. Nanoletters 6, 445-448.
Determining functional conformations of two HDV III strainsOctober 29, 2007 5:00 pm - 6:30 pm
Joint work with Sarah D. Linnstaedt2, John L.
Casey2,
and Bruce A. Shapiro3.
1Basic Research Program, SAIC-Frederick, Inc., NCI
Frederick, Frederick, MD
2Department of Microbiology and Immunology, Georgetown
University Medical
Center, Washington, DC
3Center for Cancer Research Nanobiology Program, National
Cancer Institute,
Frederick, MD
Hepatitis Delta virus (HDV) is a sub-viral human pathogen
aggravating Hepatitis
B virus (HBV) liver infections. The short HDV genome (~1680
nt) is a single
stranded, circular RNA encoding only one protein, the
hepatitis delta antigen
(HDAg). The host enzyme ADAR1 edits the HDV stop codon
(UAG) into a tryptophan
(W) codon (UGG) enabling expression of the two forms of the
protein, short and
long, from the same open reading frame. HDAg-S is required
for replication,
while HDAg-L enables viral particle formation and inhibits
replication. The
balance between the two forms is crucial and editing must
be regulated.
We have applied our programs, MPGAfold and StructureLab, to
predict and examine
the folding coformations/states of an HDV III construct.
This construct includes
the editing site (amber/W) and has the editing capabilities
of the full HDV III.
The predicted secondary structure folding dynamics
indicates that the HDV III
RNA forms a meta-stable branched structure and a stable rod
structure. Both
were observed in vitro, and the branched structure was
identified as the one
enabling editing. Computational predictions and the
experimental data also
indicate that an Ecuadorian strain folds into the
editing-capable structures
more readily than a Peruvian strain, and we indicate the
reasons for the
difference. Thus the folding dynamics of HDV III strains
appears to strongly
influence their RNA editing levels.
Funded in part by NCI Contract N01-CO-1240.
Finding additional functional elements in essential RNA sites: not conserved, but not unimportantOctober 29, 2007 5:00 pm - 6:30 pm
Joint work with Vikas Malaiya, Jana Chocholousova, Matthew Iyer, Irene Majerfeld, and Michael Yarus.
Evolutionary conservation has often been used to recover the essential pieces of RNA sites, yet can only reveal elements that are necessary, rather than sufficient, for function. Biochemical studies in several systems, including the hammerhead ribozyme and the purine riboswitch, indicate additional regions, such as loop-loop interactions, that are required for function yet are not phylogenetically conserved. Here we use a minimal motif for binding the amino acid tryptophan to ask the ultimate question of an RNA motif: do we know the essential elements well enough to embed the motif in a random-sequence background and obtain functional molecules? We show the utility of this technique for discovering additional sequence requirements for the motif, in this case the requirement for an unpaired G in a specific range of locations and structures relative to the main loop identified by SELEX, and discuss its implications for calculating the probability of obtaining functional RNAs from random-sequence pools.
Annotated tertiary interaction motifs in RNA structuresOctober 29, 2007 5:00 pm - 6:30 pm
RNA tertiary motifs play an important role in RNA folding. To understand the
complex organization of RNA tertiary interactions, we compiled a dataset
containing 54 high-resolution RNA crystal structures. Seven RNA tertiary
motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin,
tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by
different computer programs. For the non-redundant RNA dataset, 605 RNA
tertiary interactions were found. Most of these 3D interactions occur in the
16S and 23S rRNAs. Exhaustive search of these motifs reveals diversity of
interaction. Correlation between motifs (e.g. pseudoknot or coaxial helix
with A-minor) shows that they can form "composite" motifs. These findings
may lead to tertiary structure constraints useful for RNA 3D prediction.
Structure-neutral RNA substitutions from 3D structure alignments and 3D motif searchOctober 31, 2007 11:35 am - 12:05 pm
The function of structured RNA molecules depends on forming the correct 3D structure, so the most significant constraints on their sequences are structural. Structure-disrupting substitutions are selected against during evolution while structurally neutral substitutions can accumulate as populations evolve. The relevant interactions include basepairs, base-stacking, and base-phosphate interactions, all of which can be disrupted by certain substitutions. Further constraints are imposed by interactions with other molecules. The general question we address is how to determine which substitutions are structure-neutral in RNA molecules, at the level of individual bases and base-pairs and at the level of 3D motifs and molecular architectures. The availability of two or more 3D structures of large RNA molecules such as the 16S and 23S rRNAs presents opportunities for exploring this question empirically, once the two structures are appropriately aligned. Detailed examination and comparison of nucleotide-nucleotide interaction geometries provides another avenue for addressing the same question. Finally, some RNA motifs occur multiple times in single structures and in non-homologous positions in other molecules, giving another way to study the neutrality of base substitutions. These three approaches will be described and their results compared. Along the way, we will briefly describe FR3D (“Find RNA 3D”), a set of Matlab programs we have developed to annotate RNA structures and to carry out searches for recurrent RNA motifs.
Structure, dynamics and catalytic mechanisms of two ribozymesNovember 01, 2007 3:05 pm - 3:35 pm
CR-UK Nucleic Acid Structure Group, MSI/WTB complex, University of Dundee, Dundee DD1 5EH, UK d.m.j.lilley@dundee.ac.uk
The nucleolytic ribozymes are catalytic RNA molecules that generate site-specific cleavage by means of a transesterification reaction involving the 2’ and 5’ O atoms. We have made a study of two of these, the hairpin and VS ribozymes.
The hairpin ribozyme folds to generate an intimate loop-loop interaction to create the local environment in which catalysis can proceed. By means of FRET we can observe individual hairpin ribozyme molecules as they undergo multiple cycles of cleavage and ligation, and measure the rates of the internal reactions. On average, the cleaved ribozyme undergoes several docking-undocking events before a ligation reaction occurs. On the basis of these experiments, we have explored the role of the nucleobases G8 and A38 in the catalysis. Both cleavage and ligation reactions are pH dependent, corresponding to the titration of a group with pKA = 6.2. We have used a novel ribonucleoside in which these bases are replaced by imidazole to investigate the role of acid-base catalysis in this ribozyme. We observe significant rates of cleavage and ligation, and a bell-shaped pH dependence for both.
The VS ribozyme is the largest of the nucleolytic ribozymes, and the only one for which there is no crystal structure. The ribozyme consists of five helical sections organised by two three-way junctions, each of which undergo metal ion-induced folding. Using a ‘divide and conquer’ approach based principally on the analysis of component junctions by FRET, we deduced the global structure of the ribozyme. We have now solved the structure of the complete ribozyme at low resolution using small-angle X-ray scattering in solution.
The binding of the substrate stem-loop generates a catalytically-productive interaction with the A730 loop active site. We have identified two critical nucleotides in the catalytic process; A756 within the A730 loop, and G638 in the substrate internal loop. Mutation or functional group substitution of either nucleobase leads to > 1,000-fold impairment of catalytic activity, while leaving the structure and binding to the ribozyme unaltered. The pH dependencies of the rate of cleavage of substrate with guanine, adenine, 2,6-diaminopurine or inosine at position 638 are fully consistent with a mechanism in which G638 and A756 act in concert in general acid-base catalysis.
The proposed mechanism of the VS and hairpin ribozymes, together with the manner of the generation of the active sitea and their topology, are strikingly similar each other. This has probably arisen by convergent evolution.
M.K. Nahas, T.J. Wilson, S. Hohng, K. Jarvie, D.M.J. Lilley and T. Ha Observation of internal cleavage and ligation reactions of a ribozyme Nature Struct. Molec. Biol. 11, 1107-1113 (2004).
Z. Zhao, A. McLeod, S. Harusawa, L. Araki, M. Yamaguchi, T. Kurihara and D. M. J. Lilley Nucleobase participation in ribozyme catalysis. J. Amer. Chem. Soc. 127, 5026-5027 (2005).
T. J. Wilson, J. Ouellet, Z. Zhao, S. Harusawa, L. Araki, T. Kurihara and D. M. J. Lilley Nucleobase catalysis in the hairpin ribozyme. RNA 12, 980-987 (2006).
T. J. Wilson, A. C. McLeod and D. M. J. Lilley A guanine nucleobase important for catalysis by the VS ribozyme EMBO J. 26, 2489-2500 (2007).
MASTR: Simultaneous multiple alignment and structure prediction of non-coding RNAs using simulated annealingOctober 29, 2007 5:00 pm - 6:30 pm
Joint work with Anders Krogh, University of Copenhagen and Paul Gardner, Wellcome Trust Sanger Institute.
With the growing interest in non-coding RNAs and their function, there is also a growing need for computational tools that can be used to analyze new sequences and predict their secondary structure. For a single RNA sequence, a common approach is to find the minimum free energy conformation. However, by looking at many related sequences, it is possible to incorporate evolutionary information and perform a comparative analysis which can improve the prediction. Preferably, one should perform the multiple alignment of the sequences and the prediction of their common secondary structure simultaneously. We present a novel heuristic method that uses simulated annealing to iteratively improve the sequence alignment and a common secondary structure. This is done in the context of a sampling approach using fairly simple moves, that either change the sequence alignment by moving gaps around or update the structure on the base pair level. The prediction is evaluated using a cost function that combines the log-likelihood of the alignment, the base pair probabilities and a covariation term. The method, implemented in the C++ program MASTR (Multiple Alignment of STructural RNAs), is competitive to other current programs, both in terms of speed, alignment quality and structure quality.
References:
Lindgreen S, Gardner PP, Krogh A (2007): "MASTR: Multiple alignment and structure prediction of non-coding RNAs using simulated annealing", Bioinformatics (accepted)
Theory and application of a novel RNA folding approach based on
nucleotide cyclic motifs
October 31, 2007 11:00 am - 11:30 am
Joint work with Marc Parisien (Institute for Research in Immunology and Cancer,
Department of Computer Science,
University of Montreal).
We change the classical rationale underlying RNA structure prediction by
incorporating the contributions of the non-Watson-Crick base pairs. To do
so, we define a new first-order object for representing nucleotide
relationships in structured RNAs, which we call nucleotide cyclic motif
(NCM) (1). In comparison to the classical stacks of Watson‑Crick base pairs,
the properties that make NCMs appealing for structure determination are the
facts that: i) the same algorithm can be employed for predicting secondary,
tertiary, and 3-D structures; ii) the RNA structural motifs are either made
of one or more NCMs (2); iii) the NCMs embrace indistinctly both canonical
and non‑canonical base pairs; and, iv) the NCMs precisely designate how any
nucleotide in a sequence relates to the others. A structure generator and
scoring function has been developed: MC-Fold. We show how MC-Fold, combined
to MC-Sym (3), builds RNA 3-D structures from sequence data and, combined to
MC-Cons, clusters and aligns RNA family sequences. We show how
low-resolution data can be incorporated in the modeling to reach
conformational states that are difficult to access by sequence data alone.
1. Lemieux, S. and Major, F. (2006) Automated extraction and classification
of RNA tertiary structure cyclic motifs. Nucleic Acids Res., 34, 2340-2346.
2. St-Onge, K., Thibault, P., Hamel, S. and Major, F. (2007) Modeling RNA
tertiary structure motifs by graph-grammars. Nucleic Acids Res, 35,
1726-1736.
3. Major, F. (2003) Building Three-Dimensional Ribonucleic Acid Structures.
IEEE Comp Science Eng 5:44-53.
Prediction of the secondary structure common to two
sequences:
Free energy minimization and comparative analysisOctober 29, 2007 12:05 pm - 12:35 pm
In this talk, I will discuss the dynamic programming
methods for
simultaneously predicting secondary structure and alignment
for two
sequences. This approach was first suggested by Sankoff in
1985.
Recently, it has been implemented by several groups, using
one or more
heuristics to reduce the computational cost. Our
implementation,
Dynalign, finds the lowest free energy common secondary
structure. It
uses single sequence secondary structure prediction and
sequence
alignment data as input to reduce the search space to make
the
calculation tractable for long sequences.
Computational models for RNA silencing pathways under time-dependent transgene transcription rates
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Jack Yang and Roy Mahapatra.
The synthesis of dsRNA is analyzed using a pathway model with amplifications caused by the aberrant RNAs. The transgene influx rates are assumed time-decaying and Gaussian functions of time. The dynamics of the transgene induced RNA silencing is investigated with a system of coupled non-autonomous nonlinear differential equations describing the process phenomenologically. The silencing phenomena are detected after a period of transcription. Important contributions of several parameters, including those leading to bifurcation patterns, are discussed with a series of numerical examples.
RNA matrices and RNA secondary structuresOctober 29, 2007 3:25 pm - 3:40 pm
Two lower-triangular arrays with entries that count RNA secondary structures of a given length are mentioned in this short talk. The array entries also count specific lattice walks. There is a one-to-one correspondence between RNA structures and a subset of the walks. We will discuss various ways in which the walks can be used as a tool to help predict primary RNA sequences.
ARTS and DARTS: A method and database for exploring RNA tertiary structuresOctober 30, 2007 3:05 pm - 3:35 pm
Joint work with Oranit Dror, Mira Avraham, Haim Wolfson (Tel Aviv University and SAIC, NCI-Frederick).
An increasing number of non-coding RNAs have recently been discovered
as key players in a variety of cellular pathways and pathological
processes. Much like proteins,the function of these active RNAs can be
inferred from their tertiary (3D) structures. However, in contrast to
proteins, the number of tools and databases for 3D structural analysis of
RNA is still limited. With the aim to fill this void, we have developed a
computational method, named ARTS, for aligning RNA tertiary structures.
Given a pair of RNA structures, the method searches for a-priori unknown
common substructures. The search is truly three dimensional and
irrespective of the order of the nucleotide chain. The detected common
substructures are either large global folds or small local tertiary
motifs. The method is highly-efficient and was used in a fully automatic
framework for clustering all the currently available RNA structures. The
result is a database, named DARTS, which reveals the current fold
repertoire of solved RNA structures and provides a hierarchical
classification for them. Both the method and the database should be useful
for structural and functional analysis of RNA. They may shed new light on
the evolutionary relationship between RNAs and reveal possible building
blocks and functional properties.
This publication has been funded in whole or in part with Federal funds
from the National Cancer Institute, National Institutes of Health, under
contract # NO1-CO-12400.
A topological classification of RNA foldsNovember 01, 2007 4:20 pm - 4:50 pm
After reviewing some elementary properties of RNA, we show how the RNA
folding problem can be formulated exactly in terms of an NxN matrix
field theory. This formulation introduces a classification of RNA structures
according to their topological genus.
The large N limit of this theory generates the secondary structures of RNA
(planar graphs), whereas 1/N corrections are identified as pseudo-knots. We
show how the RNA structures can be analyzed in terms of primitive pseudo
knots of low genus and how this concept can be included in Monte Carlo
calculations to actually predict RNA folds.
RNA folding during transcription facilitated by non-native structures
October 29, 2007 5:00 pm - 6:30 pm
RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.
RNA folding during transcription facilitated by non-native structuresOctober 29, 2007 3:45 pm - 4:00 pm
RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.
Genomic identification of structural RNAs using phylo-SCFGs October 30, 2007 5:35 pm - 5:50 pm
RNA structures often evolve with characteristic substitution patterns
that preserve base-pairs in spite of changes in primary sequence. With
the advent of closely related full-length genomes, it has become
possible to exploit this comparative signal for genomic identification
of structural RNAs (1).
Phylo-SCFGs (2) are attractive models for this problem since they can
describe both RNA structure, using stochastic context-free grammars
(SCFGs), and sequence evolution, using phylogenetic models. Using
variations of classical algorithms, multiple alignments with any
number sequences can be handled efficiently.
EvoFold implements this approach and has been used to screen
multiple-sequence genomic-alignments of both vertebrates and
Drosopholids for structural RNAs (1,3). This has resulted in hundreds of
high-confidence novel candidates of both ncRNAs and cis-regulatory
structures.
1) Identification and Classification of Conserved RNA Secondary
Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G,
Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and
Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.
2) Using stochastic context free grammars and molecular evolution to
predict RNA secondary structure. Knudsen B and Hein JJ.
Bioinformatics. 1999; 15 (6): 446-454.
3) Discovery of functional elements in 12 Drosophila genomes using
evolutionary signatures. Stark A, Lin MF, Kheradpour P, and
Pedersen JS, et al. 2007 (in press).
Analysis and design of nucleic acid devicesOctober 29, 2007 2:30 pm - 3:00 pm
DNA and RNA are versatile construction materials.
By appropriately designing the sequence of bases in each strand, synthetic nucleic acid
systems can be programmed to self-assemble into complex structures that implement dynamic mechanical
tasks. Motivated by the challenge of encoding arbitrary mechanical function into
nucleic acid sequences, we are developing a suite of computational algorithms for
analyzing the underlying free energy landscapes that control the
behavior of a system. This talk will focus on new algorithms for predicting the
equilibrium properties of an entire test tube of interacting nucleic acid strands.
The utility of the approach will be demonstrated by elucidating the empirical
behavior of hybridization chain reaction mechanisms that are under development
with application to biosensing, transport, and therapeutics.
Locomotif: from graphical motif description to RNA
motif searchOctober 30, 2007 5:55 pm - 6:10 pm
Motivated by the recent rise of interest in small regulatory
RNAs, we present Locomotif - a new approach for locating RNA
motifs that goes beyond the previous ones in three ways: (1)
Motif search is based on efficient dynamic programming
algorithms, incorporating the established thermodynamic model
of RNA secondary structure formation. (2) Motifs are described
graphically, using a Java-based editor, and search algorithms
are derived from the graphics in a fully automatic way. The
editor allows us to draw secondary structures, annotated with
size and sequence information. They closely resemble the
established, but informal way in which RNA motifs are
communicated in the literature. Thus, the learning effort for
Locomotif users is minimal. (3) Locomotif employs a
client-server approach. Motifs are designed by the user
locally. Search programs are generated and compiled on a
bioinformatics server. They are made available both for
execution on the server, and for download as C source code plus
an appropriate make-file.
Availability: Locomotif is available at
http://bibiserv.techfak.uni-bielefeld.de/locomotif.
Feynman diagrams, RNA folding, and the transition polynomialOctober 31, 2007 1:25 pm - 1:40 pm
Feynman diagrams were introduced by physicists. They arise naturally
in mathematics (from knots and singular knots), and in molecular
biology (from RNA folding). In particular, work of G. Vernizzi, H.
Orland, and A. Zee
has shown that the "genus" of Feynman diagrams plays an important role
in the prediction of RNA structures.
The transition polynomial for 4-regular graphs was defined by Jaeger to
unify polynomials given by vertex reconfigurations similar to the
skein relations of knots. It is closely related to the Kauffman bracket,
Tutte polynomial, and the Penrose polynomial.
We define a transition polynomial for Feynman diagrams and discuss its
properties. In particular, we show that the genus of a Feynman
diagram is encoded in the transition polynomial. This is joint work
with Kerry Luse.
Computational comparative genomics for discovery of cis-regulatory RNAs in bacteria October 29, 2007 5:00 pm - 6:30 pm
Discovery of novel functional noncoding RNA is a multifaceted problem. Motif representation, inference and search are all important, as is incorporation of relevant biological knowledge. With careful attention to all of these, we have developed a comparative genomics "pipeline" for discovery of cis-regulatory RNA elements in bacteria. We represents motifs using covariance models (CMs), as in the Rfam database. Motif inference in unaligned sequences with extraneous flanking regions (i.e., local alignment) relies on CMfinder [2]. We apply it to intergenic regions upstream of homologous genes in different bacteria, since cis-regulatory elements often are found and conserved there [3]. We use Ravenna [1] for efficient, sensitive CM search to identify additional instances. This is critical since (i) a given RNA element often regulates multiple genes in a pathway, not just homologs, and (ii) more examples allow us to refine the model (also via CMfinder), in turn enabling further discoveries. This strategy recovers most known RNAs in Firmicutes [3]. More importantly, we discovered 6 new likely riboswitch families, most experimentally verified, plus over 20 other elements in a wide variety of bacteria [3,4]. Collectively, these RNAs are involved in diverse but individually specific cellular processes, such as ribosome biogenesis, molybdenum cofactor biosynthesis and the citric acid cycle. One of the more surprising finds is a widespread riboswitch that apparently regulates such disparate processes as natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered. Many computational challenges also remain.
[1] Weinberg and Ruzzo. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 2006, 22(1):35-39
[2] Yao, Weinberg and Ruzzo. CMfinder--A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics, 2006, 22(4): 445-452.
[3] Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. A Computational Pipeline for High Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes. PLoS Computational Biology. 3(7): e126, July 6, 2007.
[4] Weinberg, Barrick, Yao, Roth, Kim, Gore, Wang, Lee, Block, Sudarsan, Neph, Tompa, Ruzzo and Breaker. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucl. Acids Res., July 2007 35: 4809-4819.
IntroductionOctober 29, 2007 11:15 am - 11:30 am
Computational approaches to RNA nanodesignNovember 02, 2007 11:00 am - 11:30 am
We have developed a number of computational tools that permit a user to design RNA based nano-particles with various functionalities. One of these tools is a newly developed relational database, RNAJunction, which contains structural and sequence information for all known RNA n-way junctions and kissing loop interactions. The database also contains the results from applying molecular mechanics and structural clustering techniques to the motifs. The database of motifs can be searched in a variety of ways and provide a source for further analysis and RNA nano building blocks. Another computational tool, NanoTiler, permits a user to interactively and automatically construct user specified RNA-based nano-scale shapes. The combination of the RNAJunction database, NanoTiler and other computational tools allows the rapid prototyping of designed RNA shapes. We discuss some of the principles involved in these design tools and show how the RNA nanodesign process can be accomplished with the use of these methodologies.
Imaging RNA structures and folding intermediates using electron cryo-microscopyOctober 30, 2007 3:50 pm - 4:20 pm
We investigate the applicability of electron cryomicroscopy (cryo-EM) with single particle reconstruction in the RNA structural studies as small as 154 residues (~50 kD). This size is at least two-fold smaller than the generally conceived limits for single-particle image reconstruction by cryo-EM of macromolecules. For the Specificity and Catalytic domain of bacterial RNase P RNA, single-particle reconstruction of the native structures exhibits good agreement with their respective crystal structures. For the major thermodynamic folding intermediate of the /B. subtilis/ specificity domain, the single-particle reconstruction has considerable similarity to the previously proposed structural models of this intermediate. These results indicate that cryo-EM can directly image conformations of relatively small RNA molecules in different structural and functional states.
Collective properties of evolving populations of RNA moleculesOctober 29, 2007 5:00 pm - 6:30 pm
RNA molecules, through their dual appearance as sequence and
structure, represent a suitable model to study evolutionary properties
of quasispecies. The essential ingredient in this model is the
differentiation between genotype (molecular sequences which are
affected by mutation) and phenotype (molecular structure, affected by
selection). This framework allows a quantitative analysis of
organizational properties of quasispecies as they adapt to different
environments, such as their robustness, the effect of the degeneration
of the sequence space, or the adaptation under different mutation
rates and the error threshold associated.
On the design of oligos for gene synthesis
October 29, 2007 5:00 pm - 6:30 pm
Methods for reliable synthesis of long genes offer great promise for
protein synthesis via expression of synthetic genes, with applications
to improved analysis of protein structure and function, as well as
engineering of novel proteins. Current technologies for gene
synthesis use computational methods for design of short oligos, which
can then be reliably synthesized and assembled into the desired target
gene. For collision-oblivious oligo design -- when mishybridizations
between oligos are ignored -- we give a simple and efficient dynamic
programming algorithm. We conjecture that the collision-aware oligo
design problem is NP-hard and provide evidence that mishybridizations
between oligos occur infrequently in the designs from the
collision-oblivious algorithm. We extend our dynamic programing
algorithm to achieve collision-aware oligo design, when the target
gene can be partitioned into independently-assembled short segments.
We evaluate our methods on a large biological gene set.
Exploring the energy landscape of RNANovember 01, 2007 4:55 pm - 5:25 pm
Recent single molecule experiments and high-resolution temperature jump experiments show that the energy landscape of RNA is rugged. As a result, even the formation of a hairpin, exhibits all the signatures of folding (multiple pathways and complex kinetics) usually associated with self-assembly of ribozymes. I will describe the kinetics of hairpin formation, initiated by both temperature and force quench, using computations. The profound differences between the two methods will be illustrated in terms of the pathways to the native state. Analogies to folding of ribozymes will also be given.
Comparative genomics beyond sequence based alignments: RNA structures in
the ENCODE regions
October 29, 2007 5:00 pm - 6:30 pm
Joint work with Z. Yao, E. D. Wiklund, J. B. Bramsen , C. Hansen, J. Kjems,
N. Tommerup, W. L. Ruzzo, and J. Gorodkin.
Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms
have relied on existing multiple sequence alignments. However, as sequence
similarity drops, a key signal of RNA structure - frequent compensating base
changes - is increasingly likely to cause sequence-based alignment methods to
misalign, or even refuse to align, homologous ncRNAs, consequently obscuring
that structural signal. We have used CMfinder [1], a structure-oriented local
alignment tool, to search vertebrate multiple alignments in the ENCODE
regions. In agreement with other studies [2], we find a large number of
potential RNA structures in the ENCODE regions. We report 6,587 candidates with
an estimated false positive rate of 50%. More intriguingly, many of these
candidates may be better represented by alignments taking the RNA secondary
structure into account than those based on primary sequence alone, often quite
dramatically. For example, approximately one quarter of these 6,587 candidates
show revisions in more than 50% of their aligned positions. Furthermore, our
results are strongly complementary to those discovered by
sequence-alignment-based approaches—84% of our candidates are not covered by
Washietl et al.[2], increasing the number of ncRNA candidates in the ENCODE
region by 32%. In a group of eleven ncRNA candidates that were tested by
RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue. Our
results broadly suggest caution in any analysis relying on multiple sequence
alignments in less well-conserved regions, clearly support growing appreciation
for the biological significance of ncRNAs, and strongly argue for considering
RNA structure directly in any searches for these elements.
1. Yao, Z., Weinberg, Z. and Ruzzo, W.L. 2006. CMfinder - A Covariance Model
Based RNA Motif Finding Algorithm. Bioinformatics 22: 445-452.
2. Washietl, S., Pedersen, J.S., Korbel, J.O., Gruber, A.R., Hackermuller, J.,
Hertel, J., Lindemeyer, M., Reiche, K., Stocsits, C., Tanzer, A., et
al. 2007. Structured RNAs in the ENCODE Selected Regions of the Human
Genome. Genome Research 17: 852-864.
Efficient algorithms for probing the RNA mutation
landscape and prediction of deleterious mutations
October 29, 2007 5:00 pm - 6:30 pm
We develop an efficient algorithm to compute, for a given RNA sequence and
simultaneously for each k, the minimum free energy structure MFE_k and the
Boltzmann partition function Z_k over all secondary structures of all k-point
mutants of the given sequence. Using the partition function, we rigorously
sample from the ensemble of low energy k-point mutants in order to explore the
mutation landscape. Our algorithm, named RNAmutants, allows us to investigate
deleterious mutations (mutations that radically modify secondary structure) in
the Hepatitis C virus cis-acting replication (HCV CAR) element and the hairpin
of human immunodeficiency virus trans-activation response (HIV-1 TAR) element.
More generally, using RNAmutants, we study the resiliance of an RNA molecule to
pointwise mutations. By computing the mutation profile of a sequence, a novel
graphical representation of the mutational tendency of nucleotide positions, we
analyze the deleterious nature of mutating specific nucleotide positions or
groups of positions. In particular, we show qualitative agreement between
published HIV experimental mutagenesis studies and our analysis of deleterious
mutations using RNAmutants. Our work predicts other deleterious mutations,
which could be verified experimentally.
Work in collaboration with P. Clote, B. Berger and S. Devadas.
Improved RNA gene predictions through dinucleotide controlled
randomization of multiple sequence alignments
October 29, 2007 5:00 pm - 6:30 pm
Tanja Gesell (1) & Stefan Washietl (2,3)
1. Center for Integrative Bioinformatics, Max Perutz Laboratories, Vienna
2. EMBL-European Bioinformatics Institute, Hinxton United Kingdom
3. Department of Theoretical Chemistry, University of Vienna, Austria
Most noncoding RNA gene prediction programs are based on the detection of conserved RNA secondary structures in multiple alignments [1]. Although this approach seems to be the most promising, the main problem of current algorithms is the large number of false positive predictions, in particular in large vertebrate genomes.
As the available algorithms assume a mononucleotide background model, a major source of erroneously predicted RNA structures is the biased dinucleotide content found e.g. in vertebrate genomes. While there are well known algorithms for randomization of single sequences preserving dinucleotide content, no algorithms exist for multiple alignments. We present a novel algorithm addressing this problem.
Our approach which involves in silico evolution along a phylogenetic tree has two key features: (i) We make use of a new evolutionary model that considers site specific and overlapping dependencies [2]. This enables us to simulate alignments with given dinucleotide (or higher order nucleotide) content. (ii) The model includes site-specific rate factors that preserve critical conservation patterns of the original alignments. We developed a time-efficient distance based approximation method to estimate a tree under this complex model which is used as guide for simulating new alignments.
Based on this improved null model we have implemented a noncoding RNA gene prediction algorithm called SISSIz, that builds upon the RNAalifold and AlifoldZ programs. The new dinucleotide based program shows significantly improved accuracy over its mononucleotide counterparts on a vertebrate test set.
[1] Washietl S., Hofacker I.L., Lukasser M., Huettenhofer A., Stadler P.F. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. (2005), 23:1383-90.
[2] Gesell T., von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics. (2006), 22:716-22
Architecture and reactivity of RNAOctober 31, 2007 2:05 pm - 2:20 pm
RNA architecture results from the hierarchical assembly of preformed double-stranded helices defined by Watson-Crick base pairs and RNA modules maintained by non-Watson-Crick base pairs. Surprisingly, the most common RNA-RNA interaction motif, the A-minor motif, is also the least specific in its local requirements. A-minor motifs are mediated by adenines binding into the shallow/minor groove of any combination of stacked and helical Watson-Crick base pairs. Thus, A-minor motifs are mutationally robust and can accommodate many combinations of neutral mutations. This complicates the search of functional RNAs in genomes and dilutes the links between RNA structure and evolution.
The bacterial ribosomal decoding A site exploits this lack of local atomic specificity. There, the adenines A1492 and A1493 of the A site are seen either tucked in within the internal loop or bulging out and poised for interaction. This dynamic equilibrium contributes to the decoding process during recognition of the codon:anticodon Watson-Crick base pairings.
In contrast, for RNA folding, where specificity is a requirement, global, positional and orientational, constraints on the native fold must occur upstream in the folding process. Critical parameters are the lengths of the helices, the co-axiality of the helical stacks, and the structure adopted at the junctions of helices. The molecular neutrality present in the local interactions is thus partially compensated by these global topological criteria, much less accessible to sequence analysis since they are attached to the three-dimensional architecture. The search for functional RNAs in genomes is thereby complexified through this dilution of the direct links between sequences and structures. The simultaneous treatment of 3D structures, structural alignments, and annotations of the interactions should allow hopefully to derive some rules of molecular evolution in structured RNAs.
Lescoute, A. and Westhof, E. (2006) The interaction networks of structured RNAs. Nucleic Acids Res 34, 6587.
Hammann, C. and Westhof, E. (2007) Searching genomes for ribozymes and riboswitches. Genome Biology 8, 210.
How RNA tells right from wrong: base pairs, tertiary interactions, and counterions in RNA foldingNovember 01, 2007 2:30 pm - 3:00 pm
RNAs must self-assemble into unique three-dimensional structures in the cell, yet how RNA molecules find their native structure in a short time is not well understood. This is a challenging problem, because RNA secondary structures are thermodynamically stable but not uniquely specified by the sequence, while tertiary interactions are specific but not very stable. Consequently, many RNAs become trapped in metastable, non-native intermediates. Recent footprinting and SAXS experiments on a bacterial ribozyme show that tertiary interactions make helix assembly more specific during the initial collapse transition. Specific collapse increases the flux through folding pathways that lead directly to the native structure. The stability of the folded RNA also depends on the charge density of the counterions; small multivalent counterions stabilize the RNA more than large monovalent ions. In low charge density counterions, the transition state ensemble becomes broader, accelerating the search for the native structure.
Estimating the fraction of non-coding RNAs in mammalian
transcriptomesOctober 29, 2007 5:00 pm - 6:30 pm
Recent studies of mammalian transcriptomes have identified numerous RNA
transcripts that do not code for proteins; their identity, however, is
largely unknown. Here we explore an approach based on sequence
randomness patterns to discern different RNA classes. The relative
z-score we use helps identify the known ncRNA class from the genome,
intergene, and intron classes. This leads us to a fractional ncRNA
measure of putative ncRNA datasets which we model as a mixture of
genuine ncRNAs and other transcripts derived from genomic, intergenic
and intronic sequences. We use this model to analyze six representative
datasets, identified by the FANTOM3 project and two computational
approaches based on comparative analysis (RNAz and EvoFold). Our
analysis suggests fewer ncRNAs than estimated by DNA sequencing and
comparative analysis, but the verity of our approach and its prediction
requires more extensive experimental RNA data.
Using RNA 3D structure data in SCFG/MRF models to do sequence alignment and motif inferenceOctober 31, 2007 12:50 pm - 1:20 pm
RNA 3D structure files contain essentially complete information about the interactions that form the 3D structure of an RNA molecule for a given organism. Homologous molecules in other organisms will have very similar 3D structures, but we expect to see sequence variability due to structurally neutral base substitutions, insertions, and deletions, among other things. RNA databases have far more RNA sequences than RNA 3D structures, and this will always be the case. We wish to use the 3D structure data to make inferences about the 3D structure of homologous molecules on the basis of their sequences.
We think of homologous RNA sequences as being random variants of the molecule for which we have a 3D structure. If the probabilistic model of this variation is simple enough, we can use it to align the sequences to the 3D structure, and thus infer the structural role of each base in the sequence. Stochastic context free grammars (SCFGs) can account for the nested Watson-Crick basepairs prevalent in RNA, and by choosing appropriate basepair substitution probabilities, they can be used to model structurally neutral basepair substitutions for non-Watson-Crick basepairs as well. We use an SCFG formalism enhanced by production rules based on Markov Random Fields (MRF). This allows us to model base triples such as are found in sarcin motifs, and local crossing interactions such as are found in kink turn internal loops. However, as usual with SCFG, it does not allow us to model longer-range pseudoknots.
The SCFG/MRF model can be used for two purposes: First, to make RNA multiple sequence alignments based on the 3D structure of one molecule, without reference to a hand-curated seed alignment. Second, to infer the 3D structure of small motifs such as internal loops from their sequences.
Computational methods for RNA secondary structure determinationOctober 29, 2007 11:30 am - 12:00 pm
The talk will begin with a definition of RNA secondary structure,
including three different ways to display these structures. Two distinct
approaches will be presented for determining secondary structure from
sequence data. The comparative method requires a multiple sequence
alignment of a collection of homologous RNA sequences. It uses phylogeny
to determine common, conserved base pairs that are more likely to be the
result of evolution than to exist by chance. On the other hand, recursive
algorithms may be used on single sequences to compute minimum free energy
structures, partition functions and other biophysical quantities. These
algorithms ignore evolution and use empirically derived energy parameters
based on physical chemistry. Examples will be given for both methods.