Campuses:

<span class=strong>Reception and Poster Session</span>

Monday, October 29, 2007 - 5:00pm - 6:30pm
Lind 400
  • A Continuous Probabilistic Model of Local RNA 3-D Structure
    Jes Frellsen (University of Copenhagen)
    Joint work with Ida Moltke, Martin Thiim and Thomas Hamelryck (The Bioinformatics Center, University of Copenhagen)

    So far, the most common approach to modeling local RNA 3-D structure has been to describe the local conformational space as discrete in a non-probabilistic framework. We present an original approach to modeling local RNA 3-D structure, namely a probabilistic model that treats the conformational space as continuous. In our model the backbone dihedral angles and the base dihedral angles are modeled with a Dynamic Bayesian Network using directional statistics. The model assigns a probability distribution to the conformational space and therefore it has numerous applications. It allows for fast probabilistic sampling of locally RNA-like structures and it can therefore be used in RNA 3-D structure prediction, where one of the problems is how to efficiently search through the space of plausible RNA structures. Today, the state-of-the-art method for suggesting plausible RNA structures is based on assembling fragments from libraries. Further, the model can also be used for deriving probabilities of seeing different local structures and it can therefore be used for quality validation of experimentally determined structures.
  • Efficient Algorithms for Probing the RNA Mutation Landscape and Prediction of Deleterious Mutations
    Jérôme Waldispühl (Massachusetts Institute of Technology)
    We develop an efficient algorithm to compute, for a given RNA sequence and
    simultaneously for each k, the minimum free energy structure MFE_k and the
    Boltzmann partition function Z_k over all secondary structures of all k-point
    mutants of the given sequence. Using the partition function, we rigorously
    sample from the ensemble of low energy k-point mutants in order to explore the
    mutation landscape. Our algorithm, named RNAmutants, allows us to investigate
    deleterious mutations (mutations that radically modify secondary structure) in
    the Hepatitis C virus cis-acting replication (HCV CAR) element and the hairpin
    of human immunodeficiency virus trans-activation response (HIV-1 TAR) element.
    More generally, using RNAmutants, we study the resiliance of an RNA molecule to
    pointwise mutations. By computing the mutation profile of a sequence, a novel
    graphical representation of the mutational tendency of nucleotide positions, we
    analyze the deleterious nature of mutating specific nucleotide positions or
    groups of positions. In particular, we show qualitative agreement between
    published HIV experimental mutagenesis studies and our analysis of deleterious
    mutations using RNAmutants. Our work predicts other deleterious mutations,
    which could be verified experimentally.
    Work in collaboration with P. Clote, B. Berger and S. Devadas.
  • Determining Functional Conformations of Two HDV III Strains
    Wojciech (Voytek) Kasprzak (National Cancer Institute)
    Joint work with Sarah D. Linnstaedt2, John L.
    Casey2,
    and Bruce A. Shapiro3.

    1Basic Research Program, SAIC-Frederick, Inc., NCI
    Frederick, Frederick, MD


    2Department of Microbiology and Immunology, Georgetown
    University Medical
    Center, Washington, DC


    3Center for Cancer Research Nanobiology Program, National
    Cancer Institute,
    Frederick, MD

    Hepatitis Delta virus (HDV) is a sub-viral human pathogen
    aggravating Hepatitis
    B virus (HBV) liver infections. The short HDV genome (~1680
    nt) is a single
    stranded, circular RNA encoding only one protein, the
    hepatitis delta antigen
    (HDAg). The host enzyme ADAR1 edits the HDV stop codon
    (UAG) into a tryptophan
    (W) codon (UGG) enabling expression of the two forms of the
    protein, short and
    long, from the same open reading frame. HDAg-S is required
    for replication,
    while HDAg-L enables viral particle formation and inhibits
    replication. The
    balance between the two forms is crucial and editing must
    be regulated.

    We have applied our programs, MPGAfold and StructureLab, to
    predict and examine
    the folding coformations/states of an HDV III construct.
    This construct includes
    the editing site (amber/W) and has the editing capabilities
    of the full HDV III.
    The predicted secondary structure folding dynamics
    indicates that the HDV III
    RNA forms a meta-stable branched structure and a stable rod
    structure. Both
    were observed in vitro, and the branched structure was
    identified as the one
    enabling editing. Computational predictions and the
    experimental data also
    indicate that an Ecuadorian strain folds into the
    editing-capable structures
    more readily than a Peruvian strain, and we indicate the
    reasons for the
    difference. Thus the folding dynamics of HDV III strains
    appears to strongly
    influence their RNA editing levels.

    Funded in part by NCI Contract N01-CO-1240.
  • MASTR: Simultaneous Multiple Alignment and Structure Prediction of Non-coding RNAs Using Simulated Annealing
    Stinus Lindgreen (University of Copenhagen)
    Joint work with Anders Krogh, University of Copenhagen and Paul Gardner, Wellcome Trust Sanger Institute.

    With the growing interest in non-coding RNAs and their function, there is also a growing need for computational tools that can be used to analyze new sequences and predict their secondary structure. For a single RNA sequence, a common approach is to find the minimum free energy conformation. However, by looking at many related sequences, it is possible to incorporate evolutionary information and perform a comparative analysis which can improve the prediction. Preferably, one should perform the multiple alignment of the sequences and the prediction of their common secondary structure simultaneously. We present a novel heuristic method that uses simulated annealing to iteratively improve the sequence alignment and a common secondary structure. This is done in the context of a sampling approach using fairly simple moves, that either change the sequence alignment by moving gaps around or update the structure on the base pair level. The prediction is evaluated using a cost function that combines the log-likelihood of the alignment, the base pair probabilities and a covariation term. The method, implemented in the C++ program MASTR (Multiple Alignment of STructural RNAs), is competitive to other current programs, both in terms of speed, alignment quality and structure quality.

    References:

    Lindgreen S, Gardner PP, Krogh A (2007): MASTR: Multiple alignment and structure prediction of non-coding RNAs using simulated annealing, Bioinformatics (accepted)
  • Designing Structured RNA Pools for In Vitro Selection of RNAs
    Hin Gan (New York University)
    In vitro selection of RNAs is a versatile experimental technology
    for discovering novel RNA molecules from randon sequence pools.
    However, finding complex RNA molecules is difficult because simple
    motifs dominate in random pools. Thus, engineering sequence pools
    possessing complex structures could increase the probability of
    discovering novel RNAs.

    The mathematical problem of designing structured RNA pools is to
    optimize the sequence/structure space to yield the structural
    characteristics of the target pool. We represent experimental pool
    generation as nucleotide mixing (transition) matrix applied to a
    starting sequence, and pool structures as RNA graphs. These tools
    allow us to map regions of RNA sequence space using mixing matrix
    and their structural distributions. The target structured pool
    corresponds to an optimal combination of mixing matrices,
    starting sequences, and associated pool fractions.

    We show that our pool design approach allows generation of pools
    with user-defined characteristics, such as proportions of specific
    target motifs, starting functional sequences, and sequence length.
    Our pool design method has been automated and made available through
    the webserver RAGPOOLS (http://rubin2.biomath.nyu.edu) that offers a
    theoretical companion tool for RNA in vitro selection and related
    problems.
    Thus, RAGPOOLS can serve as a guide to researchers who aim to synthesize
    RNA pools with desired properties and/or perform in silico experiments.

    References:

    Kim N, Shin JS, Elmetwaly S, Gan HH, and Schlick T, RAGPOOLS:
    RNA-As-Graph-Pools A web server for assisting the design of structured
    RNA pools for i n vitro s election. Bioinformatics 2007 (In Press).

    Kim N, Gan HH, Schlick T, A computational proposal for designing
    structured RNA pools for in vitro selection of RNAs.RNA 2007,
    13(4):478-92.
  • Functional Classification of all Non-coding Microbial Sequences through Phylogenetic Profiling
    Daniel Gautheret (Université de Paris XI (Paris-Sud))
    Joint work with Antonin Marchais and Magali Naville
    (IGM. Bât 400 - Université Paris-Sud - 91405 Orsay cedex –
    France).

    Although comparative genomics has been instrumental in the
    identification of novel non-coding RNA (ncRNA) in model
    genomes, this technique cannot, in the form it is currently
    practised, keep up with the pace of genome sequencing. As a
    result, hundreds of microbial genomes, including entire
    families of important pathogens, have been left out of the
    picture in terms of ncRNA function analysis. As ncRNAs play
    major regulatory and adaptive roles in bacteria, there is an
    urgent need for innovative computational methods that would
    permit a quick and efficient detection of ncRNAs in any genome
    of interest. Here we propose a protocol that exploits the depth
    of phylogenetic information in all available genomes (with
    virtually no limitation in the number of species) to produce a
    functional classification of all ncRNA candidates and other
    non-coding conserved elements in any target bacterial genome.

    Our protocol involves a low-stringency screening for intergenic
    conserved elements (ICEs) in the target genome, followed by the
    construction of the presence/absence profile of each ICE across
    the complete bacterial genome collection. All ICEs are then
    clustered according to the distance of their phylogenetic
    profiles, as done by Pellegrini et al. [1] for classifying
    protein genes. A simultaneous clustering of ICEs and ORFs
    produces a complete classification of coding and non-coding
    elements in the target genome. We ran this pipeline on E.
    coli

    and B. subtilis. In both species, known small RNAs and
    riboswitches were significantly concentrated (P~10-6) in two or
    three clusters containing as well many orthologous E.
    coliand
    B. subtilis ORFs and ~200 undefined ICEs in each species.
    Phylogenetic profile clustering is independent of sequence
    similarity and appears to predict functional ncRNAs with a much
    higher specificity than comparative sequence analysis.
    Furthermore, some clusters that lack known ncRNAs show very
    interesting phylogenetic presence/absence patterns that
    indicate either horizontal transfers or the emergence of common
    adaptive non-coding elements in distant bacterial species.
    Finally, the co-occurrence of ICEs and protein coding genes in
    the same clusters may constitute an important source of
    information on ICE/ORF functional relationships. A complete run
    of our ICE classification pipeline on a bacterial genome only
    requires a few hours.

    [1] Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates
    TO. (1999) Assigning protein functions by comparative genome
    analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci.
    U.S.A. 96:4285-4288
  • RNA Matrices and RNA Secondary Structures
    Asamoah Nkwanta (Morgan State University)
    Two lower-triangular arrays with entries that count RNA secondary structures of a given length are mentioned in this short talk. The array entries also count specific lattice walks. There is a one-to-one correspondence between RNA structures and a subset of the walks. We will discuss various ways in which the walks can be used as a tool to help predict primary RNA sequences.
  • SwS: A Solvation Web Service for Nucleic Acids

    Modeling accurately the solvation of nucleic acid systems is an important
    issue since it has been shown that water, together with the surrounding
    ionic atmosphere, is an essential component of RNA and DNA structure. A new
    web service, called SwS (Solvation web Service for nucleic acids), will be
    presented. This web service, based on the nucleic acid structures contained
    in the NDB, is devoted to the statistical analysis of the first solvation
    shell of important structural fragments and has been developed to allow
    accurate comparisons between theoretical (molecular dynamics simulations)
    and experimental (x-ray) data and to better understand molecular recognition
    phenomena involving water and ions. Such data will also, at a more subtle
    level, improve our views on assembly rules of tertiary structural motifs of
    nucleic acids.
  • On the Design of Oligos for Gene Synthesis
    Chris Thachuk (University of British Columbia)
    Methods for reliable synthesis of long genes offer great promise for
    protein synthesis via expression of synthetic genes, with applications
    to improved analysis of protein structure and function, as well as
    engineering of novel proteins. Current technologies for gene
    synthesis use computational methods for design of short oligos, which
    can then be reliably synthesized and assembled into the desired target
    gene. For collision-oblivious oligo design -- when mishybridizations
    between oligos are ignored -- we give a simple and efficient dynamic
    programming algorithm. We conjecture that the collision-aware oligo
    design problem is NP-hard and provide evidence that mishybridizations
    between oligos occur infrequently in the designs from the
    collision-oblivious algorithm. We extend our dynamic programing
    algorithm to achieve collision-aware oligo design, when the target
    gene can be partitioned into independently-assembled short segments.
    We evaluate our methods on a large biological gene set.
  • RNA Pseudoknotted Secondary Structure Prediction using Hierarchical Folding
    Hosna Jabbari (University of British Columbia)
    Improving the accuracy and efficiency of computational RNA
    secondary
    structure prediction is an important challenge, particularly
    for
    pseudoknotted secondary structures. We propose a new approach
    for
    prediction of pseudoknotted structures, motivated by the
    hypothesis
    that RNA structures fold hierarchically, with pseudoknot free
    pairs
    forming initially, and pseudoknots forming later so as to
    minimize
    energy relative to the initial pseudoknot free structure. Our
    HFold
    (Hierarchical Fold) algorithm has O(n3) running time, and
    can handle
    a wide range of biological structures, including nested
    kissing
    hairpins, which have previously required O(n6) time using
    traditional
    minimum free energy approaches. We also report on an
    experimental
    evaluation of HFold.
  • Binding of Aminoglycosidic Antibiotics to the Oligonucleotide A-site Model
    Maciej Dlugosz (University of Warsaw)
    Coauthors J. M. Antosiewicz and J. Trylska.

    Aminoglycosidic antibiotics are anti-bacterial molecules which target
    the A-site of the small ribosomal subunit. Using Brownian dynamics we
    simulated the encounter of four different aminoglycosidic antibiotics
    with their RNA binding site on the ribosome. The considered
    antibiotics include neamine, neomycin, paromomycin and
    ribostamycin. They are amine sugar derivatives, composed of 2 to 4
    rings, with a positive total charge of +4 to +6e. The influence of
    structural, electrostatic and hydrodynamic properties of antibiotics
    on the kinetics of their association with the ribosomal A-site is
    discussed. Diffusion limited rates of association are computed and
    their dependence on ionic strength of the surrounding is examined. The
    mechanism of diffusion towards the RNA and the formation of the
    encounter complex is analyzed.
  • Finding Additional Functional Elements in Essential RNA Sites: Not Conserved, but Not Unimportant
    Rob Knight (University of Colorado)
    Joint work with Vikas Malaiya, Jana Chocholousova, Matthew Iyer, Irene Majerfeld, and Michael Yarus.

    Evolutionary conservation has often been used to recover the essential pieces of RNA sites, yet can only reveal elements that are necessary, rather than sufficient, for function. Biochemical studies in several systems, including the hammerhead ribozyme and the purine riboswitch, indicate additional regions, such as loop-loop interactions, that are required for function yet are not phylogenetically conserved. Here we use a minimal motif for binding the amino acid tryptophan to ask the ultimate question of an RNA motif: do we know the essential elements well enough to embed the motif in a random-sequence background and obtain functional molecules? We show the utility of this technique for discovering additional sequence requirements for the motif, in this case the requirement for an unpaired G in a specific range of locations and structures relative to the main loop identified by SELEX, and discuss its implications for calculating the probability of obtaining functional RNAs from random-sequence pools.
  • Feyman Diagrams, RNA Folding, and the Transition Polynomial

    Feynman diagrams were introduced by physicists. They arise naturally
    in mathematics (from knots and singular knots), and in molecular
    biology (from RNA folding). In particular, work of G. Vernizzi, H.
    Orland, and A. Zee
    has shown that the genus of Feynman diagrams plays an important role
    in the prediction of RNA structures.

    The transition polynomial for 4-regular graphs was defined by Jaeger to
    unify polynomials given by vertex reconfigurations similar to the
    skein relations of knots. It is closely related to the Kauffman bracket,
    Tutte polynomial, and the Penrose polynomial.

    We define a transition polynomial for Feynman diagrams and discuss its
    properties. In particular, we show that the genus of a Feynman
    diagram is encoded in the transition polynomial. This is joint work
    with Kerry Luse.



  • The Rfam Database: We Need You
    Alex Bateman (Wellcome Trust Sanger Institute)Paul Gardner (Wellcome Trust Sanger Institute)
    The Rfam database is a collection of multiple sequence alignments and
    covariance models representing many common non-coding RNA gene (ncRNA)
    families.
    Rfam aims to facilitate the identification and classification of new
    members of known sequence families, and distributes annotation of
    ncRNAs in over 200 complete genome sequences. Rfam release 8.0
    contains 574 ncRNA families (including 427 bona fide RNA genes, and
    145 regulatory elements).
    For each family we provide predicted secondary structures, multiple
    sequence alignments, species distribution, annotation and links to
    other external specialised resources.
    All our data is available and searchable online or for download and
    local installation.
  • Genomic Identification of Structural RNAs using phylo-SCFGs
    Jakob Pedersen (University of Copenhagen)
    RNA structures often evolve with characteristic substitution patterns
    that preserve base-pairs in spite of changes in primary sequence. With
    the advent of closely related full-length genomes, it has become
    possible to exploit this comparative signal for genomic identification
    of structural RNAs (1).

    Phylo-SCFGs (2) are attractive models for this problem since they can
    describe both RNA structure, using stochastic context-free grammars
    (SCFGs), and sequence evolution, using phylogenetic models. Using
    variations of classical algorithms, multiple alignments with any
    number sequences can be handled efficiently.

    EvoFold implements this approach and has been used to screen
    multiple-sequence genomic-alignments of both vertebrates and
    Drosopholids for structural RNAs (1,3). This has resulted in hundreds of
    high-confidence novel candidates of both ncRNAs and cis-regulatory
    structures.

    1) Identification and Classification of Conserved RNA Secondary
    Structures in the Human Genome. Pedersen JS , Bejerano G, Siepel G,
    Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, and
    Haussler D. PLoS Comput Biol. 2006 Apr;2(4):e33.

    2) Using stochastic context free grammars and molecular evolution to
    predict RNA secondary structure. Knudsen B and Hein JJ.
    Bioinformatics. 1999; 15 (6): 446-454.

    3) Discovery of functional elements in 12 Drosophila genomes using
    evolutionary signatures. Stark A, Lin MF, Kheradpour P, and
    Pedersen JS, et al. 2007 (in press).


  • RNA Folding during Transcription Facilitated by Non-native Structures
    Tao Pan (University of Chicago)
    RNA folding in the cell occurs during transcription. Expedient RNA folding must avoid the formation of undesirable structures as the nascent RNA emerges from the RNA polymerase. We show that efficient folding during transcription of three conserved non-coding RNAs from E. coli, RNase P RNA, SRP RNA and tmRNA is facilitated by their cognate polymerase pausing at specific locations. These pause sites are located between the upstream and the downstream portions of all the native long-range helices in these non-coding RNAs. In the paused complexes, the nascent RNAs form labile structures that sequester these upstream portions in a manner as to guide folding. Both the pause sites and the secondary structure of the non-native portions of the paused complexes are phylogenetically conserved. Specific pausing-induced structural formation can be a general strategy to facilitate the folding of long-range helices. This polymerase-based mechanism may result in portions of non-coding RNA sequences to be evolutionarily conserved for efficient folding during transcription.
  • Comparative Genomics Beyond Sequence based Alignments: RNA Structures in

    the ENCODE Regions

    Elfar Torarinsson (University of Copenhagen)
    Joint work with Z. Yao, E. D. Wiklund, J. B. Bramsen , C. Hansen, J. Kjems,
    N. Tommerup, W. L. Ruzzo, and J. Gorodkin.

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms
    have relied on existing multiple sequence alignments. However, as sequence
    similarity drops, a key signal of RNA structure - frequent compensating base
    changes - is increasingly likely to cause sequence-based alignment methods to
    misalign, or even refuse to align, homologous ncRNAs, consequently obscuring
    that structural signal. We have used CMfinder [1], a structure-oriented local
    alignment tool, to search vertebrate multiple alignments in the ENCODE
    regions. In agreement with other studies [2], we find a large number of
    potential RNA structures in the ENCODE regions. We report 6,587 candidates with
    an estimated false positive rate of 50%. More intriguingly, many of these
    candidates may be better represented by alignments taking the RNA secondary
    structure into account than those based on primary sequence alone, often quite
    dramatically. For example, approximately one quarter of these 6,587 candidates
    show revisions in more than 50% of their aligned positions. Furthermore, our
    results are strongly complementary to those discovered by
    sequence-alignment-based approaches—84% of our candidates are not covered by
    Washietl et al.[2], increasing the number of ncRNA candidates in the ENCODE
    region by 32%. In a group of eleven ncRNA candidates that were tested by
    RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue. Our
    results broadly suggest caution in any analysis relying on multiple sequence
    alignments in less well-conserved regions, clearly support growing appreciation
    for the biological significance of ncRNAs, and strongly argue for considering
    RNA structure directly in any searches for these elements.

    1. Yao, Z., Weinberg, Z. and Ruzzo, W.L. 2006. CMfinder - A Covariance Model
    Based RNA Motif Finding Algorithm. Bioinformatics 22: 445-452.

    2. Washietl, S., Pedersen, J.S., Korbel, J.O., Gruber, A.R., Hackermuller, J.,
    Hertel, J., Lindemeyer, M., Reiche, K., Stocsits, C., Tanzer, A., et
    al. 2007. Structured RNAs in the ENCODE Selected Regions of the Human
    Genome. Genome Research 17: 852-864.
  • Utilizing the RNAJunction Database for the Design of RNA Nanostructures
    Eckart Bindewald (SAIC-Frederick, Inc.)
    Joint work with Wojciech Kasprzak1, Mary O’Connor2, Brett Boyle2 and Bruce A. Shapiro2.

    1 Basic Research Program, SAIC-Frederick, Inc., NCI Frederick, Frederick, Maryland, USA


    2 Center for Cancer Research Nanobiology Program, NCI Frederick, Frederick, Maryland, USA.

    We are presenting RNAJunction, which is a database containing extracted and annotated 3D coordinate data of RNA junctions, kissing loops, internal loops and bulges. The database contains more than 12000 structural elements and allows web-based querying by sequence, type and PDB information. The database allows searching by geometric constraints (inter-helix angles); this is useful for the design of RNA nanostructures.
    We show how these structural elements can be utilized to generate ring structures and other complexes using the NanoTiler software. We present five different approaches for assembling RNA complexes from building blocks. Several examples of automatically generated computational RNA models are presented.
    Funded in part by DHHS #N01-CO-12400.
  • Multi-scale Simulation of RNA Catalytic Activity
    George Giambasu (University of Minnesota, Twin Cities)
    Joint work with Taisung Lee and Darrin M. York (Department of Chemistry, University of Minnesota).

    We present a series of multi-scale simulation studies
    on RNA catalysis. The results of several series of molecular
    dynamics (MD) and QM/MM simulations on of the full-length
    hammerhead ribozyme and the L1 Ligase ribozyme are presented.
    For the hammerhead ribozyme we have used simulations to
    investigate the role of metal ions and the possible solvent
    structure in the crystal, and study/predict the mutation
    effects at the C3 and G8 sites. For the L1 Ligase we have
    studied the details of a major conformational change prior to
    the reaction and possible conformations of the ligation site in
    the reactant state.These simulations (each with a length of 50
    to 100 ns, with a total of more than 1.5 ms) are at least one
    to two orders longer than any previous reported simulations and
    significant amount of unrevealed insights have been found
    through our simulations.
  • Efficient Algorithms for Pobing the RNA Mutation Landscape (with Waldispuehl, Devadas, Berger)
    Peter Clote (Boston College)
    The diversity and importance of the role played by RNAs in the regulation and development of the cell
    has now been demonstrated. This broad range of functions is achieved through specific structures which
    have been (presumably) optimized through evolution. The existence of a well-founded energy function
    for RNA has enabled accurate ab-initio secondary structure prediction. State-of-the-art methods such as
    McCaskill, use a statistical mechanics framework based on the computation of the partition function over
    the canonical ensemble of all possible secondary structures on a given sequence. Unfortunately, these
    techniques do not permit any modification of the input sequence during their execution and thus cannot
    investigate the mutation landscape of this sequence.
  • Annotated Tertiary Interaction Motifs in RNA Structures
    Christian Laing (New York University)
    RNA tertiary motifs play an important role in RNA folding. To understand the
    complex organization of RNA tertiary interactions, we compiled a dataset
    containing 54 high-resolution RNA crystal structures. Seven RNA tertiary
    motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin,
    tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by
    different computer programs. For the non-redundant RNA dataset, 605 RNA
    tertiary interactions were found. Most of these 3D interactions occur in the
    16S and 23S rRNAs. Exhaustive search of these motifs reveals diversity of
    interaction. Correlation between motifs (e.g. pseudoknot or coaxial helix
    with A-minor) shows that they can form composite motifs. These findings
    may lead to tertiary structure constraints useful for RNA 3D prediction.
  • Collective Properties of Evolving Populations of RNA Molecules
    Michael Stich (Instituto Nacional de Tecnica Aeroespacial)
    RNA molecules, through their dual appearance as sequence and
    structure, represent a suitable model to study evolutionary properties
    of quasispecies. The essential ingredient in this model is the
    differentiation between genotype (molecular sequences which are
    affected by mutation) and phenotype (molecular structure, affected by
    selection). This framework allows a quantitative analysis of
    organizational properties of quasispecies as they adapt to different
    environments, such as their robustness, the effect of the degeneration
    of the sequence space, or the adaptation under different mutation
    rates and the error threshold associated.
  • Computational Models for RNA Silencing Pathways under Time-dependent Transgene Transcription Rates
    Roderick Melnik (Wilfrid Laurier University)
    Joint work with Jack Yang and Roy Mahapatra.

    The synthesis of dsRNA is analyzed using a pathway model with amplifications caused by the aberrant RNAs. The transgene influx rates are assumed time-decaying and Gaussian functions of time. The dynamics of the transgene induced RNA silencing is investigated with a system of coupled non-autonomous nonlinear differential equations describing the process phenomenologically. The silencing phenomena are detected after a period of transcription. Important contributions of several parameters, including those leading to bifurcation patterns, are discussed with a series of numerical examples.
  • Locomotif: From Graphical Motif Description to RNA Motif Search
    Jens Reeder (Universität Bielefeld)
    Motivated by the recent rise of interest in small regulatory
    RNAs, we present Locomotif - a new approach for locating RNA
    motifs that goes beyond the previous ones in three ways: (1)
    Motif search is based on efficient dynamic programming
    algorithms, incorporating the established thermodynamic model
    of RNA secondary structure formation. (2) Motifs are described
    graphically, using a Java-based editor, and search algorithms
    are derived from the graphics in a fully automatic way. The
    editor allows us to draw secondary structures, annotated with
    size and sequence information. They closely resemble the
    established, but informal way in which RNA motifs are
    communicated in the literature. Thus, the learning effort for
    Locomotif users is minimal. (3) Locomotif employs a
    client-server approach. Motifs are designed by the user
    locally. Search programs are generated and compiled on a
    bioinformatics server. They are made available both for
    execution on the server, and for download as C source code plus
    an appropriate make-file.

    Availability: Locomotif is available at href=http://bibiserv.techfak.uni-bielefeld.de/locomotif>http://bibiserv.techfak.uni-bielefeld.de/locomotif.
  • Local Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix
    Jan Gorodkin (University of Copenhagen)
    Joint with Jakob H. Havgaard and Elfar Torarinsson (Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Frederiksberg, Denmark).

    The Sankoff algorithm for simultaneously folding and aligning
    RNA
    sequences is computationally very heavy. Recently a number of
    groups have applied various constraints to lower the
    computational requirements to reasonable levels. Whereas the
    original Sankoff algorithm as well as many of the
    implementations, only conduct global alignments, the FOLDALIGN
    implementation makes both local and global structural
    alignment.
    The most recent version of FOLDALIGN introduces pruning of the
    dynamical programming matrix as a simple and effective
    heuristic
    which lowers the time and memory requirements significantly
    without lowering the predictive performance. FOLDALIGN is
    currently one of few Sankoff alogorithms capable of conductiong
    local alignments while being a practical tool. It has also been
    used in genome-wide screen for putative RNA structures in
    corresponding, but unaligned regions between human and mouse.
    In
    addition to the pairwise version of FOLDALIGN we have also made
    a
    multiple alignment method which either takes the pairwise
    alignments or McCaskill basepair probability matrices as input.


    References

    - Fast pairwise structural RNA alignments by pruning of the
    dynamical programming matrix. J. H. Havgaard, E. Torarinsson
    and
    J. Gorodkin PLoS Computational Biology, in press

    - Multiple structural alignment and clustering of RNA
    sequences.
    E. Torarinsson, J. H. Havgaard and J. Gorodkin Bioinformatics,
    23:926-932, 2007.

    - Thousands of corresponding human and mouse genomic regions
    unalignable in primary sequence contain common RNA structure.
    E.
    Torarinsson, M. Sawera, J. H. Havgaard, M. Fredholm and J.
    Gorodkin Genome Research, 16:885-889, 2006.
  • Efficient Parameter Estimation for RNA Secondary Structure Prediction
    Mirela Andronescu (University of British Columbia)
    Joint work with Anne Condon, Holger H. Hoos, David H. Mathews, and
    Kevin P. Murphy.

    Motivation: Accurate prediction of RNA secondary structure from the
    base sequence is an unsolved computational challenge. The accuracy of
    predictions made by free energy minimization is limited by the
    quality of the energy parameters in the underlying free energy model.
    The most widely used model, the Turner99 model, has hundreds of
    parameters, and so a robust parameter estimation scheme should
    efficiently handle large data sets with thousands of structures.
    Moreover, the estimation scheme should also be trained using available
    experimental free energy data in addition to structural data.

    Results: In this work, we present constraint generation (CG), the
    first computational approach to RNA free energy parameter estimation
    that can be efficiently trained on large sets of structural as well as
    thermodynamic data. Our constraint generation approach employs a novel
    iterative scheme, whereby the energy values are first computed as the
    solution to a constrained optimization problem. Then the
    newly-computed energy parameters are used to update the constraints on
    the optimization function, so as to better optimize the energy
    parameters in the next iteration. Using our method on biologically
    sound data, we obtain revised parameters for the Turner99 energy
    model. We show that by using our new parameters, we obtain
    significant improvements in prediction accuracy over current
    state-of-the-art methods.


    Reference:

    Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and
    Kevin P. Murphy, Efficient parameter estimation for RNA secondary
    structure prediction, Bioinformatics. 2007 Jul 1;23(13):i19-28.
  • Improved RNA Gene Predictions through Dinucleotide Controlled Randomization of Multiple Sequence Alignments
    Stefan Washietl (Universität Wien)
    Tanja Gesell (1) & Stefan Washietl (2,3)

    1. Center for Integrative Bioinformatics, Max Perutz Laboratories, Vienna


    2. EMBL-European Bioinformatics Institute, Hinxton United Kingdom


    3. Department of Theoretical Chemistry, University of Vienna, Austria

    Most noncoding RNA gene prediction programs are based on the detection of conserved RNA secondary structures in multiple alignments [1]. Although this approach seems to be the most promising, the main problem of current algorithms is the large number of false positive predictions, in particular in large vertebrate genomes.

    As the available algorithms assume a mononucleotide background model, a major source of erroneously predicted RNA structures is the biased dinucleotide content found e.g. in vertebrate genomes. While there are well known algorithms for randomization of single sequences preserving dinucleotide content, no algorithms exist for multiple alignments. We present a novel algorithm addressing this problem.

    Our approach which involves in silico evolution along a phylogenetic tree has two key features: (i) We make use of a new evolutionary model that considers site specific and overlapping dependencies [2]. This enables us to simulate alignments with given dinucleotide (or higher order nucleotide) content. (ii) The model includes site-specific rate factors that preserve critical conservation patterns of the original alignments. We developed a time-efficient distance based approximation method to estimate a tree under this complex model which is used as guide for simulating new alignments.

    Based on this improved null model we have implemented a noncoding RNA gene prediction algorithm called SISSIz, that builds upon the RNAalifold and AlifoldZ programs. The new dinucleotide based program shows significantly improved accuracy over its mononucleotide counterparts on a vertebrate test set.

    [1] Washietl S., Hofacker I.L., Lukasser M., Huettenhofer A., Stadler P.F. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. (2005), 23:1383-90.

    [2] Gesell T., von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics. (2006), 22:716-22
  • Computational Comparative Genomics for Discovery of Cis-regulatory RNAs in Bacteria
    Walter Ruzzo (University of Washington)
    Discovery of novel functional noncoding RNA is a multifaceted problem. Motif representation, inference and search are all important, as is incorporation of relevant biological knowledge. With careful attention to all of these, we have developed a comparative genomics pipeline for discovery of cis-regulatory RNA elements in bacteria. We represents motifs using covariance models (CMs), as in the Rfam database. Motif inference in unaligned sequences with extraneous flanking regions (i.e., local alignment) relies on CMfinder [2]. We apply it to intergenic regions upstream of homologous genes in different bacteria, since cis-regulatory elements often are found and conserved there [3]. We use Ravenna [1] for efficient, sensitive CM search to identify additional instances. This is critical since (i) a given RNA element often regulates multiple genes in a pathway, not just homologs, and (ii) more examples allow us to refine the model (also via CMfinder), in turn enabling further discoveries. This strategy recovers most known RNAs in Firmicutes [3]. More importantly, we discovered 6 new likely riboswitch families, most experimentally verified, plus over 20 other elements in a wide variety of bacteria [3,4]. Collectively, these RNAs are involved in diverse but individually specific cellular processes, such as ribosome biogenesis, molybdenum cofactor biosynthesis and the citric acid cycle. One of the more surprising finds is a widespread riboswitch that apparently regulates such disparate processes as natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered. Many computational challenges also remain.

    [1] Weinberg and Ruzzo. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 2006, 22(1):35-39

    [2] Yao, Weinberg and Ruzzo. CMfinder--A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics, 2006, 22(4): 445-452.

    [3] Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. A Computational Pipeline for High Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes. PLoS Computational Biology. 3(7): e126, July 6, 2007.

    [4] Weinberg, Barrick, Yao, Roth, Kim, Gore, Wang, Lee, Block, Sudarsan, Neph, Tompa, Ruzzo and Breaker. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucl. Acids Res., July 2007 35: 4809-4819.
  • Riboswitches, RNA Conformational Switches and Prokaryotic Gene Regulation (with Eva Freyhulta Vincent Moultonb)
    Peter Clote (Boston College)
    Linnaeus Centre for Bioinformatics, Uppsala University, 75124 Uppsala,
    Sweden, eva.freyhult@lcb.uu.se,
    b School of Computing Sciences, University of East Anglia, Norwich, NR4
    7TJ, UK, vincent.moulton@cmp.uea.ac.uk,
    c Department of Biology, Boston College, Chestnut Hill, MA 02467, USA,
    clote@bc.edu. This work is funded in part by NSF DBI-0543506.
    Metabolite-sensing 5 -UTR (untranlated regions) of certain mRNAs, called
    riboswitches, have been discovered to undergo a conformational change upon
    ligand-binding, which thereby can up- or down-regulate the corresponding
    protein product. For instance, upon the binding of nucleotide guanine, the
    G-box riboswitch in the 5 UTR of the XPT gene of Bacillus subtillis un-
    dergoes a conformational change to create a terminator loop, thereby pre-
    maturely terminating transcription of the XPT gene. Since XPT is involved
    in guanine metabolism, this is an example of negative autoregulation by a
    riboswitch. Although riboswitches have been postulated to be an ancient
    genetic regulatory system, first developed in bacteria, the remarkable dis-
    covery of Cheah et al. in Nature 2007 suggests that eukaryotes may have
    co-opted riboswitches to control alternative splicing of genes.
    Here we describe a new algorithm RNAbor (Freyhult, Moulton, Clote
    Bioinformatics 2007) which gives information on possible conformational
    switches by computing the Boltzmann probability of structural neighbors of
    a given RNA secondary structure. A secondary structure T of a given RNA
    sequence s is called a δ-neighbor of S if T and S differ by exactly δ base pairs.
    RNAbor computes the number (Nδ ), the Boltzmann partition function (Zδ )
    and the minimum free energy (MFEδ ) and corresponding structure over the
    collection of all δ-neighbors of S. This computation is done simultaneously
    for all δ ≤ m, in run time O(m2 n3 ) and memory O(mn2 ), where n is the
    sequence length. We apply RNAbor for the detection of possible RNA con-
    formational switches, and compare RNAbor with an existent switch detection
    method. We also provide examples of how RNAbor can at times improve the
    accuracy of secondary structure prediction.
  • RNA Dinucleotide Step Parameters
    Mauricio Esguerra (Rutgers, The State University Of New Jersey )
    We present a first
    view of the space of conformations adopted by RNA in the currently best-resolved structure of the large ribosomal subunit using the dinucleotide ‘step’ parameters computed with the 3DNA software. We have
    explored how the base-step parameters for the 16 possible nucleotide steps of RNA vary in helical vs. non-helical regions.
  • Estimating the Fraction of Non-coding RNAs in Mammalian Transcriptomes
    Yurong Xin (New York University)
    Recent studies of mammalian transcriptomes have identified numerous RNA
    transcripts that do not code for proteins; their identity, however, is
    largely unknown. Here we explore an approach based on sequence
    randomness patterns to discern different RNA classes. The relative
    z-score we use helps identify the known ncRNA class from the genome,
    intergene, and intron classes. This leads us to a fractional ncRNA
    measure of putative ncRNA datasets which we model as a mixture of
    genuine ncRNAs and other transcripts derived from genomic, intergenic
    and intronic sequences. We use this model to analyze six representative
    datasets, identified by the FANTOM3 project and two computational
    approaches based on comparative analysis (RNAz and EvoFold). Our
    analysis suggests fewer ncRNAs than estimated by DNA sequencing and
    comparative analysis, but the verity of our approach and its prediction
    requires more extensive experimental RNA data.