HOME    »    SCIENTIFIC RESOURCES    »    Volumes
Abstracts and Talk Materials
RNA in Biology, Bioengineering and Nanotechnology
October 29-November 2, 2007


Mirela Andronescu

Efficient parameter estimation for RNA secondary structure prediction
December 31, 1969

Joint work with Anne Condon, Holger H. Hoos, David H. Mathews, and Kevin P. Murphy.

Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data.

Results: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our constraint generation approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly-computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state-of-the-art methods.

Reference: Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and Kevin P. Murphy, Efficient parameter estimation for RNA secondary structure prediction, Bioinformatics. 2007 Jul 1;23(13):i19-28.

Pascal Auffinger
http://www-ibmc.u-strasbg.fr/arn/Westhof/publ_West/Auffinger_pub.HTM

SwS: a solvation web service for nucleic acids
December 31, 1969

Modeling accurately the solvation of nucleic acid systems is an important issue since it has been shown that water, together with the surrounding ionic atmosphere, is an essential component of RNA and DNA structure. A new web service, called SwS ("Solvation web Service" for nucleic acids), will be presented. This web service, based on the nucleic acid structures contained in the NDB, is devoted to the statistical analysis of the first solvation shell of important structural fragments and has been developed to allow accurate comparisons between theoretical (molecular dynamics simulations) and experimental (x-ray) data and to better understand molecular recognition phenomena involving water and ions. Such data will also, at a more subtle level, improve our views on assembly rules of tertiary structural motifs of nucleic acids.

Alex Bateman
http://www.sanger.ac.uk/Users/agb/
Paul Gardner
http://www.binf.ku.dk/~pgardner/

The Rfam database: we need you
December 31, 1969

The Rfam database is a collection of multiple sequence alignments and covariance models representing many common non-coding RNA gene (ncRNA) families. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. Rfam release 8.0 contains 574 ncRNA families (including 427 bona fide RNA genes, and 145 regulatory elements). For each family we provide predicted secondary structures, multiple sequence alignments, species distribution, annotation and links to other external specialised resources. All our data is available and searchable online or for download and local installation.

Jean Bergeron

Math Matters public lecture: U.S. premier screening of the film "Achieving the unachievable" with the film's writer/director
November 1, 2007

M.C. Escher is among the most mathematical of artists. In 1956 he challenged the laws of perspective with his graphic Print Gallery, and found himself trapped by an impossible barrier. His uncompleted master-piece quickly became the most puzzling enigma of modern art, for both artists and scientists. Half a century later, mathematician Hendrik Lenstra took everyone by surprise by drawing a fantastic bridge between the intuition of the artist and his own, and completed Escher's work mathematically. This story is presented in the 52 minute film Achieving the Unachievable by documentary filmmaker Jean Bergeron. After the screening, the film's U.S. premier, Bergeron will be available to answer questions.

Read More...

Eckart Bindewald
http://www-lecb.ncifcrf.gov/~bindewae/

Utilizing the RNAJunction database for the design of RNA nanostructures
December 31, 1969

Joint work with Wojciech Kasprzak1, Mary O’Connor2, Brett Boyle2 and Bruce A. Shapiro2.

1 Basic Research Program, SAIC-Frederick, Inc., NCI Frederick, Frederick, Maryland, USA 2 Center for Cancer Research Nanobiology Program, NCI Frederick, Frederick, Maryland, USA.

We are presenting RNAJunction, which is a database containing extracted and annotated 3D coordinate data of RNA junctions, kissing loops, internal loops and bulges. The database contains more than 12000 structural elements and allows web-based querying by sequence, type and PDB information. The database allows searching by geometric constraints (inter-helix angles); this is useful for the design of RNA nanostructures. We show how these structural elements can be utilized to generate ring structures and other complexes using the NanoTiler software. We present five different approaches for assembling RNA complexes from building blocks. Several examples of automatically generated computational RNA models are presented. Funded in part by DHHS #N01-CO-12400.

Steven Brenner
http://bioeng.berkeley.edu/graduate/cvs/Brenner.html

Ultraconserved nonsense: Pervasive unproductive splicing of SR proteins associated with exceptionally conserved DNA elements -- a bizarre prevalent mode of gene regulation
October 31, 2007

Nonsense-mediated mRNA decay (NMD) is a cellular RNA surveillance system that recognizes transcripts with premature termination codons and degrades them. We previously discovered large numbers of natural alternative splice forms that appear to be targets for NMD, and we speculated that this might be a mode of gene regulation which we termed RUST (regulated unproductive splicing and translation). This seems to be confirmed by our finding that all conserved members of the SR family of splice regulators have an unproductive alternative mRNA isoform targeted for NMD. Strikingly, the splice pattern for each is conserved in mouse and always associated with an ultraconserved or highly-conserved region of ~100 or more nucleotides of perfect identity between human and mouse. Remarkably, this seems to have evolved independently in every one of the genes, suggesting that this is a natural mode of regulation.

Peter Clote
http://clavius.bc.edu/~clote/

Efficient algorithms for probing the RNA mutation landscape (with Waldispuehl, Devadas, Berger)
December 31, 1969

The diversity and importance of the role played by RNAs in the regulation and development of the cell has now been demonstrated. This broad range of functions is achieved through specific structures which have been (presumably) optimized through evolution. The existence of a well-founded energy function for RNA has enabled accurate ab-initio secondary structure prediction. State-of-the-art methods such as McCaskill, use a statistical mechanics framework based on the computation of the partition function over the canonical ensemble of all possible secondary structures on a given sequence. Unfortunately, these techniques do not permit any modification of the input sequence during their execution and thus cannot investigate the mutation landscape of this sequence.

Peter Clote
http://clavius.bc.edu/~clote/

Riboswitches, RNA conformational switches and prokaryotic gene regulation (with Eva Freyhulta Vincent Moultonb)
December 31, 1969

Linnaeus Centre for Bioinformatics, Uppsala University, 75124 Uppsala, Sweden, eva.freyhult@lcb.uu.se, b School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK, vincent.moulton@cmp.uea.ac.uk, c Department of Biology, Boston College, Chestnut Hill, MA 02467, USA, clote@bc.edu. This work is funded in part by NSF DBI-0543506. Metabolite-sensing 5 -UTR (untranlated regions) of certain mRNAs, called riboswitches, have been discovered to undergo a conformational change upon ligand-binding, which thereby can up- or down-regulate the corresponding protein product. For instance, upon the binding of nucleotide guanine, the G-box riboswitch in the 5 UTR of the XPT gene of Bacillus subtillis un- dergoes a conformational change to create a terminator loop, thereby pre- maturely terminating transcription of the XPT gene. Since XPT is involved in guanine metabolism, this is an example of negative autoregulation by a riboswitch. Although riboswitches have been postulated to be an ancient genetic regulatory system, first developed in bacteria, the remarkable dis- covery of Cheah et al. in Nature 2007 suggests that eukaryotes may have co-opted riboswitches to control alternative splicing of genes. Here we describe a new algorithm RNAbor (Freyhult, Moulton, Clote Bioinformatics 2007) which gives information on possible conformational switches by computing the Boltzmann probability of structural neighbors of a given RNA secondary structure. A secondary structure T of a given RNA sequence s is called a δ-neighbor of S if T and S differ by exactly δ base pairs. RNAbor computes the number (Nδ ), the Boltzmann partition function (Zδ ) and the minimum free energy (MFEδ ) and corresponding structure over the collection of all δ-neighbors of S. This computation is done simultaneously for all δ ≤ m, in run time O(m2 n3 ) and memory O(mn2 ), where n is the sequence length. We apply RNAbor for the detection of possible RNA con- formational switches, and compare RNAbor with an existent switch detection method. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction.

Maciej Dlugosz

Binding of aminoglycosidic antibiotics to the oligonucleotide A-site model
December 31, 1969

Coauthors J. M. Antosiewicz and J. Trylska.

Aminoglycosidic antibiotics are anti-bacterial molecules which target the A-site of the small ribosomal subunit. Using Brownian dynamics we simulated the encounter of four different aminoglycosidic antibiotics with their RNA binding site on the ribosome. The considered antibiotics include neamine, neomycin, paromomycin and ribostamycin. They are amine sugar derivatives, composed of 2 to 4 rings, with a positive total charge of +4 to +6e. The influence of structural, electrostatic and hydrodynamic properties of antibiotics on the kinetics of their association with the ribosomal A-site is discussed. Diffusion limited rates of association are computed and their dependence on ionic strength of the surrounding is examined. The mechanism of diffusion towards the RNA and the formation of the encounter complex is analyzed.

Mauricio Esguerra
http://www.eden.rutgers.edu/~esguerra

RNA dinucleotide step parameters
December 31, 1969

We present a first view of the space of conformations adopted by RNA in the currently best-resolved structure of the large ribosomal subunit using the dinucleotide ‘step’ parameters computed with the 3DNA software. We have explored how the base-step parameters for the 16 possible nucleotide steps of RNA vary in helical vs. non-helical regions.

Jes Frellsen
http://www.binf.ku.dk/~frellsen/

A continuous probabilistic model of local RNA 3-D structure
October 29, 2007

Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)

So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.

Jes Frellsen
http://www.binf.ku.dk/~frellsen/

A continuous probabilistic model of local RNA 3-D structure
December 31, 1969

Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)

So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.

Hin Hark Gan
http://monod.biomath.nyu.edu/~hgan/
Namhee Kim
http://monod.biomath.nyu.edu/index/people/namhee_kim.html
Tamar Schlick
http://www.math.nyu.edu/faculty/schlick/

Designing structured RNA Pools for RNA in vitro selection
October 30, 2007

In vitro selection is a versatile experimental tool for discovering novel synthetic RNAs. However, most RNAs identified from random sequence pools are small and simple folding motifs. To significantly increase the probability of discovering novel RNAs, we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis of RNA 2D structure space. Our pool design method has been automated and made available through the web server RAGPOOLS (http://rubin2.biomath.nyu.edu). The RAGPOOLS serves as a guide to researchers who aim to synthesize RNA pools with desired properties and/or experiment in silico with various designs by our approach.

Hin Hark Gan
http://monod.biomath.nyu.edu/~hgan/

Designing structured RNA pools for in vitro selection of RNAs
December 31, 1969

In vitro selection of RNAs is a versatile experimental technology for discovering novel RNA molecules from randon sequence pools. However, finding complex RNA molecules is difficult because simple motifs dominate in random pools. Thus, engineering sequence pools possessing complex structures could increase the probability of discovering novel RNAs.

The mathematical problem of designing structured RNA pools is to optimize the sequence/structure space to yield the structural characteristics of the target pool. We represent experimental pool generation as nucleotide mixing (transition) matrix applied to a starting sequence, and pool structures as RNA graphs. These tools allow us to map regions of RNA sequence space using mixing matrix and their structural distributions. The target structured pool corresponds to an optimal combination of mixing matrices, starting sequences, and associated pool fractions.

We show that our pool design approach allows generation of pools with user-defined characteristics, such as proportions of specific target motifs, starting functional sequences, and sequence length. Our pool design method has been automated and made available through the webserver RAGPOOLS (http://rubin2.biomath.nyu.edu) that offers a theoretical companion tool for RNA in vitro selection and related problems. Thus, RAGPOOLS can serve as a guide to researchers who aim to synthesize RNA pools with desired properties and/or perform in silico experiments.

References:

Kim N, Shin JS, Elmetwaly S, Gan HH, and Schlick T, RAGPOOLS: RNA-As-Graph-Pools A web server for assisting the design of structured RNA pools for i n vitro s election. Bioinformatics 2007 (In Press).

Kim N, Gan HH, Schlick T, A computational proposal for designing structured RNA pools for in vitro selection of RNAs.RNA 2007, 13(4):478-92.

Daniel Gautheret

Functional classification of all non-coding microbial sequences through phylogenetic profiling
December 31, 1969

Joint work with Antonin Marchais and Magali Naville (IGM. Bât 400 - Université Paris-Sud - 91405 Orsay cedex – France).

Although comparative genomics has been instrumental in the identification of novel non-coding RNA (ncRNA) in model genomes, this technique cannot, in the form it is currently practised, keep up with the pace of genome sequencing. As a result, hundreds of microbial genomes, including entire families of important pathogens, have been left out of the picture in terms of ncRNA function analysis. As ncRNAs play major regulatory and adaptive roles in bacteria, there is an urgent need for innovative computational methods that would permit a quick and efficient detection of ncRNAs in any genome of interest. Here we propose a protocol that exploits the depth of phylogenetic information in all available genomes (with virtually no limitation in the number of species) to produce a functional classification of all ncRNA candidates and other non-coding conserved elements in any target bacterial genome.

Our protocol involves a low-stringency screening for intergenic conserved elements (ICEs) in the target genome, followed by the construction of the presence/absence profile of each ICE across the complete bacterial genome collection. All ICEs are then clustered according to the distance of their phylogenetic profiles, as done by Pellegrini et al. [1] for classifying protein genes. A simultaneous clustering of ICEs and ORFs produces a complete classification of coding and non-coding elements in the target genome. We ran this pipeline on E. coli and B. subtilis. In both species, known small RNAs and riboswitches were significantly concentrated (P~10-6) in two or three clusters containing as well many orthologous E. coliand B. subtilis ORFs and ~200 undefined ICEs in each species. Phylogenetic profile clustering is independent of sequence similarity and appears to predict functional ncRNAs with a much higher specificity than comparative sequence analysis. Furthermore, some clusters that lack known ncRNAs show very interesting phylogenetic presence/absence patterns that indicate either horizontal transfers or the emergence of common adaptive non-coding elements in distant bacterial species. Finally, the co-occurrence of ICEs and protein coding genes in the same clusters may constitute an important source of information on ICE/ORF functional relationships. A complete run of our ICE classification pipeline on a bacterial genome only requires a few hours.

[1] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A. 96:4285-4288

George Madalin Giambasu

Multi scale simulation of RNA catalytic activity
December 31, 1969

Joint work with Taisung Lee and Darrin M. York (Department of Chemistry, University of Minnesota).

We present a series of multi-scale simulation studies on RNA catalysis. The results of several series of molecular dynamics (MD) and QM/MM simulations on of the full-length hammerhead ribozyme and the L1 Ligase ribozyme are presented. For the hammerhead ribozyme we have used simulations to investigate the role of metal ions and the possible solvent structure in the crystal, and study/predict the mutation effects at the C3 and G8 sites. For the L1 Ligase we have studied the details of a major conformational change prior to the reaction and possible conformations of the ligation site in the reactant state.These simulations (each with a length of 50 to 100 ns, with a total of more than 1.5 ms) are at least one to two orders longer than any previous reported simulations and significant amount of unrevealed insights have been found through our simulations.

Jan Gorodkin
http://genome.ku.dk/~gorodkin

Local pairwise structural RNA alignments by pruning of the dynamical programming matrix
December 31, 1969

Joint with Jakob H. Havgaard and Elfar Torarinsson (Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Frederiksberg, Denmark).

The Sankoff algorithm for simultaneously folding and aligning RNA sequences is computationally very heavy. Recently a number of groups have applied various constraints to lower the computational requirements to reasonable levels. Whereas the original Sankoff algorithm as well as many of the implementations, only conduct global alignments, the FOLDALIGN implementation makes both local and global structural alignment. The most recent version of FOLDALIGN introduces pruning of the dynamical programming matrix as a simple and effective heuristic which lowers the time and memory requirements significantly without lowering the predictive performance. FOLDALIGN is currently one of few Sankoff alogorithms capable of conductiong local alignments while being a practical tool. It has also been used in genome-wide screen for putative RNA structures in corresponding, but unaligned regions between human and mouse. In addition to the pairwise version of FOLDALIGN we have also made a multiple alignment method which either takes the pairwise alignments or McCaskill basepair probability matrices as input.

References

- Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. J. H. Havgaard, E. Torarinsson and J. Gorodkin PLoS Computational Biology, in press

- Multiple structural alignment and clustering of RNA sequences. E. Torarinsson, J. H. Havgaard and J. Gorodkin Bioinformatics, 23:926-932, 2007.

- Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. E. Torarinsson, M. Sawera, J. H. Havgaard, M. Fredholm and J. Gorodkin Genome Research, 16:885-889, 2006.

Christine E. Heitsch
http://www.math.gatech.edu/~heitsch

Analysis, prediction, and design of viral RNA secondary structures
October 30, 2007

Understanding how biological sequences encode structural and functional information is a fundamental scientific challenge. For RNA viral genomes, the information encoded in the sequence extends well-beyond their protein coding role to the role of intra-sequence base pairing in viral packaging, replication, and gene expression. Working with the Pariacoto virus as a model sequence, we investigate the compatibility of predicted base pairings with the dodecahedral cage known from crystallographic studies. To build a putative secondary structure, we first analyze different possible configurations using a combinatorial model of RNA folding. We give results on the trade-offs among types of loop structures, the asymptotic degree of branching in typical configurations, and the characteristics of stems in "well-determined" substructures. These mathematical results yield insights into the interaction of local and global constraints in RNA secondary structures, and suggest new directions in understanding the folding of RNA viral genomes.

Ivo L. Hofacker
http://www.tbi.univie.ac.at/~ivo/

Prediction of RNA-RNA interactions
October 30, 2007

Most noncoding RNAs exert their function by interacting with other RNA molecules, as in the case of microRNA and their mRNA targets. Recent bioinformatical as well as experimental approaches produce thousands of novel RNA transcripts most of which cannot be annotated. Our best hope for elucidating the function of these novel transcripts is through identification of their interaction partners.

In this talk I will review several approaches to predict the structure of two RNAs upon hybridization. I will present a fast method to compute the probability that some region of an RNA is unpaired and thus accessible for inter-molecular interactions (or, equivalently, the free energy needed to open up the site), as well as a method to quickly search for possible hybridization sites. The combination of these two approaches yields a promising approach to identify the RNA interaction partners of noncoding RNAs. The site accessibility is also a potent predictor of siRNA efficacy and can be used to improve microRNA target predictions.

Tan Inoue
http://kuchem.kyoto-u.ac.jp/seika/

RNA / RNP synthetic biology
November 1, 2007

In general, molecular design of RNA is difficult at the 3D level because of its highly complicated folding process. In the 1990s, biochemical and structural analyses revealed that many functional noncoding natural RNAs are organized into modules and fold into defined 3D structures. Moreover, several commonly used RNA–RNA binding motifs in these RNAs were identified by phylogenetic comparison and high-resolution structural analyses. Consequently, it has become possible to design self-folding RNAs precisely by employing such motifs and mimicking the modular organization of natural RNAs. As one such example, we have investigated the design and construction of a self-folding RNA scaffold consisting of standard doublestranded helices connected by the two RNA–RNA binding motifs. Results indicated that the constructed RNA folds compactly into the designed 3D structure. We have also reported the synthesis and development of an artificial RNA enzyme by installing a reaction site and a catalytic site into the designed RNA scaffold. For medical and biological applications, the goals of our current project are 1) to establish multifunctional RNP molecules with tumor seeking sensors, imaging agents and toxins that kill target cells, and 2) to establish artificial signal transduction systems for regulating function of a cell by employing designed RNA and RNP molecules. The strategy may be applicable to the synthesis and development of a variety of nonnatural functional RNAs with defined 3D structures.

Hosna Jabbari

RNA pseudoknotted secondary structure prediction using hierarchical folding
December 31, 1969

Improving the accuracy and efficiency of computational RNA secondary structure prediction is an important challenge, particularly for pseudoknotted secondary structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot free pairs forming initially, and pseudoknots forming later so as to minimize energy relative to the initial pseudoknot free structure. Our HFold (Hierarchical Fold) algorithm has O(n3) running time, and can handle a wide range of biological structures, including nested kissing hairpins, which have previously required O(n6) time using traditional minimum free energy approaches. We also report on an experimental evaluation of HFold.

Luc Jaeger

RNA tertiary structure as a proto-language for nano-construction
November 2, 2007

Common occurrence of many small structural motifs in natural RNA molecules suggests that nature utilize s a vocabulary of sequence patterns to compose structural molecules with sophisticated topologies such as the ribosome and large ribozymes. By careful analysis of sequences and tertiary structures of natural RNAs, 3D RNA modules and their folding and assembly principles are presently gathered for generating the syntax of a proto-language for rational design and prediction of RNA 3D shapes. RNA architectonics refers to the creation of this proto-language and to its use to build new RNAs with self-assembly properties. Recently, RNA architectonics led to the reliable prediction and design of the tertiary structure of several artificial RNA building blocks able to form programmable filaments and 2D RNA arrays at the nano-scale level. As a proof of concept, we also demonstrated that structurally complex RNAs based on a syntax involving a repertoire of several different RNA motifs can self-assemble into complex supra-molecular 3D nano-particles. This studies show that RNA architectonics can be used as a tool to explore and compare the biophysical properties of various RNA tertiary structure motifs that would be otherwise more difficult to investigate in isolation or within their natural context. It also demonstrates that RNA is an ideal medium for sculpting addressable and responsive self-assembling architectures of any desired shapes in the 20 to 50 nm scale. Moreover, it suggests that RNA supra-molecular assembly can potentially lead to the development of highly sophisticated therapeutic nano-devices for biological and medical applications.

References

1. Jaeger, L. & Chworos, A. (2006) The Architectonics of Programmable RNA and DNA Nanostructures. Current Opinion in Structural Biology, 16, 531-543.

2. Chworos, A, Severcan, I., Koyfman, A. Y., Wienkam, P., Oroudjev, E., Hansma, H. G. & Jaeger, L. (2004). Building programmable jigsaw puzzles with RNA. Science 306, 2068-2072.

3. Nasalean, L., Baudrey, S., Leontis, N.B. & Jaeger, L. (2006) Controlling RNA self-assembly to form filaments. Nucleic Acids Res. 34, 1381-1392

4. Bates, A.D., Callen B.P., Cooper J.M., Cosstick, R., Geary, C., Glidle, A., Jaeger, L., Pearson, J.L., Proupín-Pérez, M., Xu, C., & Cumming, D.R. S. (2006) Construction and characterization of a gold nanoparticle wire assembled using Mg2+-dependent RNA-RNA interactions. Nanoletters 6, 445-448.

Wojciech (Voytek) Kasprzak
http://www-lecb.ncifcrf.gov/~kasprzak

Determining functional conformations of two HDV III strains
December 31, 1969

Joint work with Sarah D. Linnstaedt2, John L. Casey2, and Bruce A. Shapiro3.

1Basic Research Program, SAIC-Frederick, Inc., NCI Frederick, Frederick, MD 2Department of Microbiology and Immunology, Georgetown University Medical Center, Washington, DC 3Center for Cancer Research Nanobiology Program, National Cancer Institute, Frederick, MD

Hepatitis Delta virus (HDV) is a sub-viral human pathogen aggravating Hepatitis B virus (HBV) liver infections. The short HDV genome (~1680 nt) is a single stranded, circular RNA encoding only one protein, the hepatitis delta antigen (HDAg). The host enzyme ADAR1 edits the HDV stop codon (UAG) into a tryptophan (W) codon (UGG) enabling expression of the two forms of the protein, short and long, from the same open reading frame. HDAg-S is required for replication, while HDAg-L enables viral particle formation and inhibits replication. The balance between the two forms is crucial and editing must be regulated.

We have applied our programs, MPGAfold and StructureLab, to predict and examine the folding coformations/states of an HDV III construct. This construct includes the editing site (amber/W) and has the editing capabilities of the full HDV III. The predicted secondary structure folding dynamics indicates that the HDV III RNA forms a meta-stable branched structure and a stable rod structure. Both were observed in vitro, and the branched structure was identified as the one enabling editing. Computational predictions and the experimental data also indicate that an Ecuadorian strain folds into the editing-capable structures more readily than a Peruvian strain, and we indicate the reasons for the difference. Thus the folding dynamics of HDV III strains appears to strongly influence their RNA editing levels.

Funded in part by NCI Contract N01-CO-1240.

Rob Knight
http://www.colorado.edu/chem/people/knightr.html

Finding additional functional elements in essential RNA sites: not conserved, but not unimportant
December 31, 1969

Joint work with Vikas Malaiya, Jana Chocholousova, Matthew Iyer, Irene Majerfeld, and Michael Yarus.

Evolutionary conservation has often been used to recover the essential pieces of RNA sites, yet can only reveal elements that are necessary, rather than sufficient, for function. Biochemical studies in several systems, including the hammerhead ribozyme and the purine riboswitch, indicate additional regions, such as loop-loop interactions, that are required for function yet are not phylogenetically conserved. Here we use a minimal motif for binding the amino acid tryptophan to ask the ultimate question of an RNA motif: do we know the essential elements well enough to embed the motif in a random-sequence background and obtain functional molecules? We show the utility of this technique for discovering additional sequence requirements for the motif, in this case the requirement for an unpaired G in a specific range of locations and structures relative to the main loop identified by SELEX, and discuss its implications for calculating the probability of obtaining functional RNAs from random-sequence pools.

Christian E. Laing
http://www.mathcs.wilkes.edu/~laing

Annotated tertiary interaction motifs in RNA structures
December 31, 1969

RNA tertiary motifs play an important role in RNA folding. To understand the complex organization of RNA tertiary interactions, we compiled a dataset containing 54 high-resolution RNA crystal structures. Seven RNA tertiary motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin, tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by different computer programs. For the non-redundant RNA dataset, 605 RNA tertiary interactions were found. Most of these 3D interactions occur in the 16S and 23S rRNAs. Exhaustive search of these motifs reveals diversity of interaction. Correlation between motifs (e.g. pseudoknot or coaxial helix with A-minor) shows that they can form "composite" motifs. These findings may lead to tertiary structure constraints useful for RNA 3D prediction.

Neocles B. Leontis

Structure-neutral RNA substitutions from 3D structure alignments and 3D motif search
October 31, 2007

The function of structured RNA molecules depends on forming the correct 3D structure, so the most significant constraints on their sequences are structural. Structure-disrupting substitutions are selected against during evolution while structurally neutral substitutions can accumulate as populations evolve. The relevant interactions include basepairs, base-stacking, and base-phosphate interactions, all of which can be disrupted by certain substitutions. Further constraints are imposed by interactions with other molecules. The general question we address is how to determine which substitutions are structure-neutral in RNA molecules, at the level of individual bases and base-pairs and at the level of 3D motifs and molecular architectures. The availability of two or more 3D structures of large RNA molecules such as the 16S and 23S rRNAs presents opportunities for exploring this question empirically, once the two structures are appropriately aligned. Detailed examination and comparison of nucleotide-nucleotide interaction geometries provides another avenue for addressing the same question. Finally, some RNA motifs occur multiple times in single structures and in non-homologous positions in other molecules, giving another way to study the neutrality of base substitutions. These three approaches will be described and their results compared. Along the way, we will briefly describe FR3D (“Find RNA 3D”), a set of Matlab programs we have developed to annotate RNA structures and to carry out searches for recurrent RNA motifs.

David M.J. Lilley
http://www.dundee.ac.uk/biocentre/nasg/index.php

Structure, dynamics and catalytic mechanisms of two ribozymes
November 1, 2007

CR-UK Nucleic Acid Structure Group, MSI/WTB complex, University of Dundee, Dundee DD1 5EH, UK d.m.j.lilley@dundee.ac.uk The nucleolytic ribozymes are catalytic RNA molecules that generate site-specific cleavage by means of a transesterification reaction involving the 2’ and 5’ O atoms. We have made a study of two of these, the hairpin and VS ribozymes. The hairpin ribozyme folds to generate an intimate loop-loop interaction to create the local environment in which catalysis can proceed. By means of FRET we can observe individual hairpin ribozyme molecules as they undergo multiple cycles of cleavage and ligation, and measure the rates of the internal reactions. On average, the cleaved ribozyme undergoes several docking-undocking events before a ligation reaction occurs. On the basis of these experiments, we have explored the role of the nucleobases G8 and A38 in the catalysis. Both cleavage and ligation reactions are pH dependent, corresponding to the titration of a group with pKA = 6.2. We have used a novel ribonucleoside in which these bases are replaced by imidazole to investigate the role of acid-base catalysis in this ribozyme. We observe significant rates of cleavage and ligation, and a bell-shaped pH dependence for both.

The VS ribozyme is the largest of the nucleolytic ribozymes, and the only one for which there is no crystal structure. The ribozyme consists of five helical sections organised by two three-way junctions, each of which undergo metal ion-induced folding. Using a ‘divide and conquer’ approach based principally on the analysis of component junctions by FRET, we deduced the global structure of the ribozyme. We have now solved the structure of the complete ribozyme at low resolution using small-angle X-ray scattering in solution.

The binding of the substrate stem-loop generates a catalytically-productive interaction with the A730 loop active site. We have identified two critical nucleotides in the catalytic process; A756 within the A730 loop, and G638 in the substrate internal loop. Mutation or functional group substitution of either nucleobase leads to > 1,000-fold impairment of catalytic activity, while leaving the structure and binding to the ribozyme unaltered. The pH dependencies of the rate of cleavage of substrate with guanine, adenine, 2,6-diaminopurine or inosine at position 638 are fully consistent with a mechanism in which G638 and A756 act in concert in general acid-base catalysis.

The proposed mechanism of the VS and hairpin ribozymes, together with the manner of the generation of the active sitea and their topology, are strikingly similar each other. This has probably arisen by convergent evolution.

M.K. Nahas, T.J. Wilson, S. Hohng, K. Jarvie, D.M.J. Lilley and T. Ha Observation of internal cleavage and ligation reactions of a ribozyme Nature Struct. Molec. Biol. 11, 1107-1113 (2004). Z. Zhao, A. McLeod, S. Harusawa, L. Araki, M. Yamaguchi, T. Kurihara and D. M. J. Lilley Nucleobase participation in ribozyme catalysis. J. Amer. Chem. Soc. 127, 5026-5027 (2005). T. J. Wilson, J. Ouellet, Z. Zhao, S. Harusawa, L. Araki, T. Kurihara and D. M. J. Lilley Nucleobase catalysis in the hairpin ribozyme. RNA 12, 980-987 (2006). T. J. Wilson, A. C. McLeod and D. M. J. Lilley A guanine nucleobase important for catalysis by the VS ribozyme EMBO J. 26, 2489-2500 (2007).

Stinus Lindgreen
http://www.binf.ku.dk/~stinus/

MASTR: Simultaneous multiple alignment and structure prediction of non-coding RNAs using simulated annealing
December 31, 1969

Joint work with Anders Krogh, University of Copenhagen and Paul Gardner, Wellcome Trust Sanger Institute.

With the growing interest in non-coding RNAs and their function, there is also a growing need for computational tools that can be used to analyze new sequences and predict their secondary structure. For a single RNA sequence, a common approach is to find the minimum free energy conformation. However, by looking at many related sequences, it is possible to incorporate evolutionary information and perform a comparative analysis which can improve the prediction. Preferably, one should perform the multiple alignment of the sequences and the prediction of their common secondary structure simultaneously. We present a novel heuristic method that uses simulated annealing to iteratively improve the sequence alignment and a common secondary structure. This is done in the context of a sampling approach using fairly simple moves, that either change the sequence alignment by moving gaps around or update the structure on the base pair level. The prediction is evaluated using a cost function that combines the log-likelihood of the alignment, the base pair probabilities and a covariation term. The method, implemented in the C++ program MASTR (Multiple Alignment of STructural RNAs), is competitive to other current programs, both in terms of speed, alignment quality and structure quality.

References: Lindgreen S, Gardner PP, Krogh A (2007): "MASTR: Multiple alignment and structure prediction of non-coding RNAs using simulated annealing", Bioinformatics (accepted)

Francois Major
http://www.iro.umontreal.ca/~major/

Theory and application of a novel RNA folding approach based on nucleotide cyclic motifs
October 31, 2007

Joint work with Marc Parisien (Institute for Research in Immunology and Cancer, Department of Computer Science, University of Montreal).

We change the classical rationale underlying RNA structure prediction by incorporating the contributions of the non-Watson-Crick base pairs. To do so, we define a new first-order object for representing nucleotide relationships in structured RNAs, which we call nucleotide cyclic motif (NCM) (1). In comparison to the classical stacks of Watson‑Crick base pairs, the properties that make NCMs appealing for structure determination are the facts that: i) the same algorithm can be employed for predicting secondary, tertiary, and 3-D structures; ii) the RNA structural motifs are either made of one or more NCMs (2); iii) the NCMs embrace indistinctly both canonical and non‑canonical base pairs; and, iv) the NCMs precisely designate how any nucleotide in a sequence relates to the others. A structure generator and scoring function has been developed: MC-Fold. We show how MC-Fold, combined to MC-Sym (3), builds RNA 3-D structures from sequence data and, combined to MC-Cons, clusters and aligns RNA family sequences. We show how low-resolution data can be incorporated in the modeling to reach conformational states that are difficult to access by sequence data alone.

1. Lemieux, S. and Major, F. (2006) Automated extraction and classification of RNA tertiary structure cyclic motifs. Nucleic Acids Res., 34, 2340-2346.

2. St-Onge, K., Thibault, P., Hamel, S. and Major, F. (2007) Modeling RNA tertiary structure motifs by graph-grammars. Nucleic Acids Res, 35, 1726-1736.

3. Major, F. (2003) Building Three-Dimensional Ribonucleic Acid Structures. IEEE Comp Science Eng 5:44-53.

David H. Mathews
http://dbb.urmc.rochester.edu/bcbp/members/faculty/mathews_david.html

Prediction of the secondary structure common to two sequences: Free energy minimization and comparative analysis
October 29, 2007

In this talk, I will discuss the dynamic programming methods for simultaneously predicting secondary structure and alignment for two sequences. This approach was first suggested by Sankoff in 1985. Recently, it has been implemented by several groups, using one or more heuristics to reduce the computational cost. Our implementation, Dynalign, finds the lowest free energy common secondary structure. It uses single sequence secondary structure prediction and sequence alignment data as input to reduce the search space to make the calculation tractable for long sequences.

Roderick Melnik
http://www.wlu.ca/~wwwmath/faculty/rmelnik/

Computational models for RNA silencing pathways under time-dependent transgene transcription rates
December 31, 1969

Joint work with Jack Yang and Roy Mahapatra.

The synthesis of dsRNA is analyzed using a pathway model with amplifications caused by the aberrant RNAs. The transgene influx rates are assumed time-decaying and Gaussian functions of time. The dynamics of the transgene induced RNA silencing is investigated with a system of coupled non-autonomous nonlinear differential equations describing the process phenomenologically. The silencing phenomena are detected after a period of transcription. Important contributions of several parameters, including those leading to bifurcation patterns, are discussed with a series of numerical examples.

Asamoah Nkwanta

RNA matrices and RNA secondary structures
December 31, 1969

Two lower-triangular arrays with entries that count RNA secondary structures of a given length are mentioned in this short talk. The array entries also count specific lattice walks. There is a one-to-one correspondence between RNA structures and a subset of the walks. We will discuss various ways in which the walks can be used as a tool to help predict primary RNA sequences.

Asamoah Nkwanta

RNA matrices and RNA secondary structures
October 29, 2007

Two lower-triangular arrays with entries that count RNA secondary structures of a given length are mentioned in this short talk. The array entries also count specific lattice walks. There is a one-to-one correspondence between RNA structures and a subset of the walks. We will discuss various ways in which the walks can be used as a tool to help predict primary RNA sequences.

Ruth Nussinov
http://ccr.cancer.gov/Staff/Staff.asp?profileid=6892

ARTS and DARTS: A method and database for exploring RNA tertiary structures
October 30, 2007

Joint work with Oranit Dror, Mira Avraham, Haim Wolfson (Tel Aviv University and SAIC, NCI-Frederick).

An increasing number of non-coding RNAs have recently been discovered as key players in a variety of cellular pathways and pathological processes. Much like proteins,the function of these active RNAs can be inferred from their tertiary (3D) structures. However, in contrast to proteins, the number of tools and databases for 3D structural analysis of RNA is still limited. With the aim to fill this void, we have developed a computational method, named ARTS, for aligning RNA tertiary structures. Given a pair of RNA structures, the method searches for a-priori unknown common substructures. The search is truly three dimensional and irrespective of the order of the nucleotide chain. The detected common substructures are either large global folds or small local tertiary motifs. The method is highly-efficient and was used in a fully automatic framework for clustering all the currently available RNA structures. The result is a database, named DARTS, which reveals the current fold repertoire of solved RNA structures and provides a hierarchical classification for them. Both the method and the database should be useful for structural and functional analysis of RNA. They may shed new light on the evolutionary relationship between RNAs and reveal possible building blocks and functional properties.

This publication has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract # NO1-CO-12400.

Henri Orland

A topological classification of RNA folds
November 1, 2007

After reviewing some elementary properties of RNA, we show how the RNA folding problem can be formulated exactly in terms of an NxN matrix field theory. This formulation introduces a classification of RNA structures according to their topological genus. The large N limit of this theory generates the secondary structures of RNA (planar graphs), whereas 1/N corrections are identified as pseudo-knots. We show how the RNA structures can be analyzed in terms of primitive pseudo knots of low genus and how this concept can be included in Monte Carlo calculations to actually predict RNA folds.

Tao Pan
http://biomed.uchicago.edu/common/faculty/pan.html

RNA folding during transcription facilitated by non-native structures
October 29, 2007

RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.

Tao Pan
http://biomed.uchicago.edu/common/faculty/pan.html

RNA folding during transcription facilitated by non-native structures
December 31, 1969

RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.

Jakob Skou Pedersen
http://www.soe.ucsc.edu/~jsp/

Genomic identification of structural RNAs using phylo-SCFGs
October 30, 2007

RNA structures often evolve with characteristic substitution patterns that preserve base-pairs in spite of changes in primary sequence. With the advent of closely related full-length genomes, it has become possible to exploit this comparative signal for genomic identification of structural RNAs (1).

Phylo-SCFGs (2) are attractive models for this problem since they can describe both RNA structure, using stochastic context-free grammars (SCFGs), and sequence evolution, using phylogenetic models. Using variations of classical algorithms, multiple alignments with any number sequences can be handled efficiently.

EvoFold implements this approach and has been used to screen multiple-sequence genomic-alignments of both vertebrates and Drosopholids for structural RNAs (1,3). This has resulted in hundreds of high-confidence novel candidates of both ncRNAs and cis-regulatory structures.

1) Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.

2) Using stochastic context free grammars and molecular evolution to predict RNA secondary structure. Knudsen B and Hein JJ. Bioinformatics. 1999; 15 (6): 446-454.

3) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Stark A, Lin MF, Kheradpour P, and Pedersen JS, et al. 2007 (in press).

Jakob Skou Pedersen
http://www.soe.ucsc.edu/~jsp/

Genomic identification of structural RNAs using phylo-SCFGs
December 31, 1969

RNA structures often evolve with characteristic substitution patterns that preserve base-pairs in spite of changes in primary sequence. With the advent of closely related full-length genomes, it has become possible to exploit this comparative signal for genomic identification of structural RNAs (1).

Phylo-SCFGs (2) are attractive models for this problem since they can describe both RNA structure, using stochastic context-free grammars (SCFGs), and sequence evolution, using phylogenetic models. Using variations of classical algorithms, multiple alignments with any number sequences can be handled efficiently.

EvoFold implements this approach and has been used to screen multiple-sequence genomic-alignments of both vertebrates and Drosopholids for structural RNAs (1,3). This has resulted in hundreds of high-confidence novel candidates of both ncRNAs and cis-regulatory structures.

1) Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.

2) Using stochastic context free grammars and molecular evolution to predict RNA secondary structure. Knudsen B and Hein JJ. Bioinformatics. 1999; 15 (6): 446-454.

3) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Stark A, Lin MF, Kheradpour P, and Pedersen JS, et al. 2007 (in press).

Niles A. Pierce
http://www.piercelab.caltech.edu/

Analysis and design of nucleic acid devices
October 29, 2007

DNA and RNA are versatile construction materials. By appropriately designing the sequence of bases in each strand, synthetic nucleic acid systems can be programmed to self-assemble into complex structures that implement dynamic mechanical tasks. Motivated by the challenge of encoding arbitrary mechanical function into nucleic acid sequences, we are developing a suite of computational algorithms for analyzing the underlying free energy landscapes that control the behavior of a system. This talk will focus on new algorithms for predicting the equilibrium properties of an entire test tube of interacting nucleic acid strands. The utility of the approach will be demonstrated by elucidating the empirical behavior of hybridization chain reaction mechanisms that are under development with application to biosensing, transport, and therapeutics.

Jens Reeder
http://www.techfak.uni-bielefeld.de/ags/pi/pages/jreeder.htm

Locomotif: from graphical motif description to RNA motif search
December 31, 1969

Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif - a new approach for locating RNA motifs that goes beyond the previous ones in three ways: (1) Motif search is based on efficient dynamic programming algorithms, incorporating the established thermodynamic model of RNA secondary structure formation. (2) Motifs are described graphically, using a Java-based editor, and search algorithms are derived from the graphics in a fully automatic way. The editor allows us to draw secondary structures, annotated with size and sequence information. They closely resemble the established, but informal way in which RNA motifs are communicated in the literature. Thus, the learning effort for Locomotif users is minimal. (3) Locomotif employs a client-server approach. Motifs are designed by the user locally. Search programs are generated and compiled on a bioinformatics server. They are made available both for execution on the server, and for download as C source code plus an appropriate make-file.

Availability: Locomotif is available at http://bibiserv.techfak.uni-bielefeld.de/locomotif.

Jens Reeder
http://www.techfak.uni-bielefeld.de/ags/pi/pages/jreeder.htm

Locomotif: from graphical motif description to RNA motif search
October 30, 2007

Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif - a new approach for locating RNA motifs that goes beyond the previous ones in three ways: (1) Motif search is based on efficient dynamic programming algorithms, incorporating the established thermodynamic model of RNA secondary structure formation. (2) Motifs are described graphically, using a Java-based editor, and search algorithms are derived from the graphics in a fully automatic way. The editor allows us to draw secondary structures, annotated with size and sequence information. They closely resemble the established, but informal way in which RNA motifs are communicated in the literature. Thus, the learning effort for Locomotif users is minimal. (3) Locomotif employs a client-server approach. Motifs are designed by the user locally. Search programs are generated and compiled on a bioinformatics server. They are made available both for execution on the server, and for download as C source code plus an appropriate make-file.

Availability: Locomotif is available at http://bibiserv.techfak.uni-bielefeld.de/locomotif.

Yongwu Rong
http://home.gwu.edu/~rong/

Feynman diagrams, RNA folding, and the transition polynomial
October 31, 2007

Feynman diagrams were introduced by physicists. They arise naturally in mathematics (from knots and singular knots), and in molecular biology (from RNA folding). In particular, work of G. Vernizzi, H. Orland, and A. Zee has shown that the "genus" of Feynman diagrams plays an important role in the prediction of RNA structures.

The transition polynomial for 4-regular graphs was defined by Jaeger to unify polynomials given by vertex reconfigurations similar to the skein relations of knots. It is closely related to the Kauffman bracket, Tutte polynomial, and the Penrose polynomial.

We define a transition polynomial for Feynman diagrams and discuss its properties. In particular, we show that the genus of a Feynman diagram is encoded in the transition polynomial. This is joint work with Kerry Luse.

Yongwu Rong
http://home.gwu.edu/~rong/

Feyman diagrams, RNA folding, and the transition polynomial
December 31, 1969

Feynman diagrams were introduced by physicists. They arise naturally in mathematics (from knots and singular knots), and in molecular biology (from RNA folding). In particular, work of G. Vernizzi, H. Orland, and A. Zee has shown that the "genus" of Feynman diagrams plays an important role in the prediction of RNA structures.

The transition polynomial for 4-regular graphs was defined by Jaeger to unify polynomials given by vertex reconfigurations similar to the skein relations of knots. It is closely related to the Kauffman bracket, Tutte polynomial, and the Penrose polynomial.

We define a transition polynomial for Feynman diagrams and discuss its properties. In particular, we show that the genus of a Feynman diagram is encoded in the transition polynomial. This is joint work with Kerry Luse.

Walter Larry Ruzzo
http://www.cs.washington.edu/homes/ruzzo/

Computational comparative genomics for discovery of cis-regulatory RNAs in bacteria
December 31, 1969

Discovery of novel functional noncoding RNA is a multifaceted problem. Motif representation, inference and search are all important, as is incorporation of relevant biological knowledge. With careful attention to all of these, we have developed a comparative genomics "pipeline" for discovery of cis-regulatory RNA elements in bacteria. We represents motifs using covariance models (CMs), as in the Rfam database. Motif inference in unaligned sequences with extraneous flanking regions (i.e., local alignment) relies on CMfinder [2]. We apply it to intergenic regions upstream of homologous genes in different bacteria, since cis-regulatory elements often are found and conserved there [3]. We use Ravenna [1] for efficient, sensitive CM search to identify additional instances. This is critical since (i) a given RNA element often regulates multiple genes in a pathway, not just homologs, and (ii) more examples allow us to refine the model (also via CMfinder), in turn enabling further discoveries. This strategy recovers most known RNAs in Firmicutes [3]. More importantly, we discovered 6 new likely riboswitch families, most experimentally verified, plus over 20 other elements in a wide variety of bacteria [3,4]. Collectively, these RNAs are involved in diverse but individually specific cellular processes, such as ribosome biogenesis, molybdenum cofactor biosynthesis and the citric acid cycle. One of the more surprising finds is a widespread riboswitch that apparently regulates such disparate processes as natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered. Many computational challenges also remain.

[1] Weinberg and Ruzzo. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 2006, 22(1):35-39

[2] Yao, Weinberg and Ruzzo. CMfinder--A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics, 2006, 22(4): 445-452.

[3] Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. A Computational Pipeline for High Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes. PLoS Computational Biology. 3(7): e126, July 6, 2007.

[4] Weinberg, Barrick, Yao, Roth, Kim, Gore, Wang, Lee, Block, Sudarsan, Neph, Tompa, Ruzzo and Breaker. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucl. Acids Res., July 2007 35: 4809-4819.

Tamar Schlick
http://www.math.nyu.edu/faculty/schlick/
Eric Westhof
http://www-ibmc.u-strasbg.fr/upr9002/westhof/

Introduction
October 29, 2007


Bruce A. Shapiro
http://www-lecb.ncifcrf.gov/~bshapiro/myself.html

Computational approaches to RNA nanodesign
November 2, 2007

We have developed a number of computational tools that permit a user to design RNA based nano-particles with various functionalities. One of these tools is a newly developed relational database, RNAJunction, which contains structural and sequence information for all known RNA n-way junctions and kissing loop interactions. The database also contains the results from applying molecular mechanics and structural clustering techniques to the motifs. The database of motifs can be searched in a variety of ways and provide a source for further analysis and RNA nano building blocks. Another computational tool, NanoTiler, permits a user to interactively and automatically construct user specified RNA-based nano-scale shapes. The combination of the RNAJunction database, NanoTiler and other computational tools allows the rapid prototyping of designed RNA shapes. We discuss some of the principles involved in these design tools and show how the RNA nanodesign process can be accomplished with the use of these methodologies.

Tobin R. Sosnick
http://sosnick.uchicago.edu/

Imaging RNA structures and folding intermediates using electron cryo-microscopy
October 30, 2007

We investigate the applicability of electron cryomicroscopy (cryo-EM) with single particle reconstruction in the RNA structural studies as small as 154 residues (~50 kD). This size is at least two-fold smaller than the generally conceived limits for single-particle image reconstruction by cryo-EM of macromolecules. For the Specificity and Catalytic domain of bacterial RNase P RNA, single-particle reconstruction of the native structures exhibits good agreement with their respective crystal structures. For the major thermodynamic folding intermediate of the /B. subtilis/ specificity domain, the single-particle reconstruction has considerable similarity to the previously proposed structural models of this intermediate. These results indicate that cryo-EM can directly image conformations of relatively small RNA molecules in different structural and functional states.

Michael Stich

Collective properties of evolving populations of RNA molecules
December 31, 1969

RNA molecules, through their dual appearance as sequence and structure, represent a suitable model to study evolutionary properties of quasispecies. The essential ingredient in this model is the differentiation between genotype (molecular sequences which are affected by mutation) and phenotype (molecular structure, affected by selection). This framework allows a quantitative analysis of organizational properties of quasispecies as they adapt to different environments, such as their robustness, the effect of the degeneration of the sequence space, or the adaptation under different mutation rates and the error threshold associated.

Chris Thachuk

On the design of oligos for gene synthesis
December 31, 1969

Methods for reliable synthesis of long genes offer great promise for protein synthesis via expression of synthetic genes, with applications to improved analysis of protein structure and function, as well as engineering of novel proteins. Current technologies for gene synthesis use computational methods for design of short oligos, which can then be reliably synthesized and assembled into the desired target gene. For collision-oblivious oligo design -- when mishybridizations between oligos are ignored -- we give a simple and efficient dynamic programming algorithm. We conjecture that the collision-aware oligo design problem is NP-hard and provide evidence that mishybridizations between oligos occur infrequently in the designs from the collision-oblivious algorithm. We extend our dynamic programing algorithm to achieve collision-aware oligo design, when the target gene can be partitioned into independently-assembled short segments. We evaluate our methods on a large biological gene set.

Devarajan Thirumalai
http://www.chem.umd.edu/Faculty_Directory/faculty.php?id=38

Exploring the energy landscape of RNA
November 1, 2007

Recent single molecule experiments and high-resolution temperature jump experiments show that the energy landscape of RNA is rugged. As a result, even the formation of a hairpin, exhibits all the signatures of folding (multiple pathways and complex kinetics) usually associated with self-assembly of ribozymes. I will describe the kinetics of hairpin formation, initiated by both temperature and force quench, using computations. The profound differences between the two methods will be illustrated in terms of the pathways to the native state. Analogies to folding of ribozymes will also be given.

Elfar Torarinsson
http://www.blezur.dk/pages/elfar.php

Comparative genomics beyond sequence based alignments: RNA structures in the ENCODE regions
December 31, 1969

Joint work with Z. Yao, E. D. Wiklund, J. B. Bramsen , C. Hansen, J. Kjems, N. Tommerup, W. L. Ruzzo, and J. Gorodkin.

Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure - frequent compensating base changes - is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder [1], a structure-oriented local alignment tool, to search vertebrate multiple alignments in the ENCODE regions. In agreement with other studies [2], we find a large number of potential RNA structures in the ENCODE regions. We report 6,587 candidates with an estimated false positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one quarter of these 6,587 candidates show revisions in more than 50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches—84% of our candidates are not covered by Washietl et al.[2], increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of eleven ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly argue for considering RNA structure directly in any searches for these elements.

1. Yao, Z., Weinberg, Z. and Ruzzo, W.L. 2006. CMfinder - A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics 22: 445-452.

2. Washietl, S., Pedersen, J.S., Korbel, J.O., Gruber, A.R., Hackermuller, J., Hertel, J., Lindemeyer, M., Reiche, K., Stocsits, C., Tanzer, A., et al. 2007. Structured RNAs in the ENCODE Selected Regions of the Human Genome. Genome Research 17: 852-864.

Jérôme Waldispühl
http://www-math.mit.edu/~jeromew/

Efficient algorithms for probing the RNA mutation landscape and prediction of deleterious mutations
December 31, 1969

We develop an efficient algorithm to compute, for a given RNA sequence and simultaneously for each k, the minimum free energy structure MFE_k and the Boltzmann partition function Z_k over all secondary structures of all k-point mutants of the given sequence. Using the partition function, we rigorously sample from the ensemble of low energy k-point mutants in order to explore the mutation landscape. Our algorithm, named RNAmutants, allows us to investigate deleterious mutations (mutations that radically modify secondary structure) in the Hepatitis C virus cis-acting replication (HCV CAR) element and the hairpin of human immunodeficiency virus trans-activation response (HIV-1 TAR) element. More generally, using RNAmutants, we study the resiliance of an RNA molecule to pointwise mutations. By computing the mutation profile of a sequence, a novel graphical representation of the mutational tendency of nucleotide positions, we analyze the deleterious nature of mutating specific nucleotide positions or groups of positions. In particular, we show qualitative agreement between published HIV experimental mutagenesis studies and our analysis of deleterious mutations using RNAmutants. Our work predicts other deleterious mutations, which could be verified experimentally. Work in collaboration with P. Clote, B. Berger and S. Devadas.

Stefan Washietl
http://www.itc.univie.ac.at/telephon.html

Improved RNA gene predictions through dinucleotide controlled randomization of multiple sequence alignments
December 31, 1969

Tanja Gesell (1) & Stefan Washietl (2,3)

1. Center for Integrative Bioinformatics, Max Perutz Laboratories, Vienna 2. EMBL-European Bioinformatics Institute, Hinxton United Kingdom 3. Department of Theoretical Chemistry, University of Vienna, Austria

Most noncoding RNA gene prediction programs are based on the detection of conserved RNA secondary structures in multiple alignments [1]. Although this approach seems to be the most promising, the main problem of current algorithms is the large number of false positive predictions, in particular in large vertebrate genomes.

As the available algorithms assume a mononucleotide background model, a major source of erroneously predicted RNA structures is the biased dinucleotide content found e.g. in vertebrate genomes. While there are well known algorithms for randomization of single sequences preserving dinucleotide content, no algorithms exist for multiple alignments. We present a novel algorithm addressing this problem.

Our approach which involves in silico evolution along a phylogenetic tree has two key features: (i) We make use of a new evolutionary model that considers site specific and overlapping dependencies [2]. This enables us to simulate alignments with given dinucleotide (or higher order nucleotide) content. (ii) The model includes site-specific rate factors that preserve critical conservation patterns of the original alignments. We developed a time-efficient distance based approximation method to estimate a tree under this complex model which is used as guide for simulating new alignments.

Based on this improved null model we have implemented a noncoding RNA gene prediction algorithm called SISSIz, that builds upon the RNAalifold and AlifoldZ programs. The new dinucleotide based program shows significantly improved accuracy over its mononucleotide counterparts on a vertebrate test set.

[1] Washietl S., Hofacker I.L., Lukasser M., Huettenhofer A., Stadler P.F. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. (2005), 23:1383-90.

[2] Gesell T., von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics. (2006), 22:716-22

Eric Westhof
http://www-ibmc.u-strasbg.fr/upr9002/westhof/

Architecture and reactivity of RNA
October 31, 2007

RNA architecture results from the hierarchical assembly of preformed double-stranded helices defined by Watson-Crick base pairs and RNA modules maintained by non-Watson-Crick base pairs. Surprisingly, the most common RNA-RNA interaction motif, the A-minor motif, is also the least specific in its local requirements. A-minor motifs are mediated by adenines binding into the shallow/minor groove of any combination of stacked and helical Watson-Crick base pairs. Thus, A-minor motifs are mutationally robust and can accommodate many combinations of neutral mutations. This complicates the search of functional RNAs in genomes and dilutes the links between RNA structure and evolution.

The bacterial ribosomal decoding A site exploits this lack of local atomic specificity. There, the adenines A1492 and A1493 of the A site are seen either tucked in within the internal loop or bulging out and poised for interaction. This dynamic equilibrium contributes to the decoding process during recognition of the codon:anticodon Watson-Crick base pairings.

In contrast, for RNA folding, where specificity is a requirement, global, positional and orientational, constraints on the native fold must occur upstream in the folding process. Critical parameters are the lengths of the helices, the co-axiality of the helical stacks, and the structure adopted at the junctions of helices. The molecular neutrality present in the local interactions is thus partially compensated by these global topological criteria, much less accessible to sequence analysis since they are attached to the three-dimensional architecture. The search for functional RNAs in genomes is thereby complexified through this dilution of the direct links between sequences and structures. The simultaneous treatment of 3D structures, structural alignments, and annotations of the interactions should allow hopefully to derive some rules of molecular evolution in structured RNAs.

Lescoute, A. and Westhof, E. (2006) The interaction networks of structured RNAs. Nucleic Acids Res 34, 6587. Hammann, C. and Westhof, E. (2007) Searching genomes for ribozymes and riboswitches. Genome Biology 8, 210.

Sarah Woodson
http://www.jhu.edu/~pmb/faculty/woodson.html

How RNA tells right from wrong: base pairs, tertiary interactions, and counterions in RNA folding
November 1, 2007

RNAs must self-assemble into unique three-dimensional structures in the cell, yet how RNA molecules find their native structure in a short time is not well understood. This is a challenging problem, because RNA secondary structures are thermodynamically stable but not uniquely specified by the sequence, while tertiary interactions are specific but not very stable. Consequently, many RNAs become trapped in metastable, non-native intermediates. Recent footprinting and SAXS experiments on a bacterial ribozyme show that tertiary interactions make helix assembly more specific during the initial collapse transition. Specific collapse increases the flux through folding pathways that lead directly to the native structure. The stability of the folded RNA also depends on the charge density of the counterions; small multivalent counterions stabilize the RNA more than large monovalent ions. In low charge density counterions, the transition state ensemble becomes broader, accelerating the search for the native structure.

Yurong Xin

Estimating the fraction of non-coding RNAs in mammalian transcriptomes
December 31, 1969

Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene, and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets, identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.

Craig L. Zirbel
http://www-math.bgsu.edu/~zirbel/

Using RNA 3D structure data in SCFG/MRF models to do sequence alignment and motif inference
October 31, 2007

RNA 3D structure files contain essentially complete information about the interactions that form the 3D structure of an RNA molecule for a given organism. Homologous molecules in other organisms will have very similar 3D structures, but we expect to see sequence variability due to structurally neutral base substitutions, insertions, and deletions, among other things. RNA databases have far more RNA sequences than RNA 3D structures, and this will always be the case. We wish to use the 3D structure data to make inferences about the 3D structure of homologous molecules on the basis of their sequences.

We think of homologous RNA sequences as being random variants of the molecule for which we have a 3D structure. If the probabilistic model of this variation is simple enough, we can use it to align the sequences to the 3D structure, and thus infer the structural role of each base in the sequence. Stochastic context free grammars (SCFGs) can account for the nested Watson-Crick basepairs prevalent in RNA, and by choosing appropriate basepair substitution probabilities, they can be used to model structurally neutral basepair substitutions for non-Watson-Crick basepairs as well. We use an SCFG formalism enhanced by production rules based on Markov Random Fields (MRF). This allows us to model base triples such as are found in sarcin motifs, and local crossing interactions such as are found in kink turn internal loops. However, as usual with SCFG, it does not allow us to model longer-range pseudoknots.

The SCFG/MRF model can be used for two purposes: First, to make RNA multiple sequence alignments based on the 3D structure of one molecule, without reference to a hand-curated seed alignment. Second, to infer the 3D structure of small motifs such as internal loops from their sequences.

Michael Zuker
http://www.rpi.edu/~zukerm/

Computational methods for RNA secondary structure determination
October 29, 2007

The talk will begin with a definition of RNA secondary structure, including three different ways to display these structures. Two distinct approaches will be presented for determining secondary structure from sequence data. The comparative method requires a multiple sequence alignment of a collection of homologous RNA sequences. It uses phylogeny to determine common, conserved base pairs that are more likely to be the result of evolution than to exist by chance. On the other hand, recursive algorithms may be used on single sequences to compute minimum free energy structures, partition functions and other biophysical quantities. These algorithms ignore evolution and use empirically derived energy parameters based on physical chemistry. Examples will be given for both methods.

Go