December 9 - 13, 2013
Keywords of the presentation: DNA, knots, links, proteins
We'll discuss recent work on knotted and linked DNA molecules. Using several case studies as examples, we'll consider the topological techniques used to model the processes that knot and link DNA. We'll explore the biological ramifications of DNA knotting and linking, and how the results of these topological models can inform experimentalists.
Keywords of the presentation: Rat hippocampus, spatial representation, togological coding, predictive model, hidden Markov model, variational Bayes
The hippocampus plays an important role in representing space (for spatial navigation) and time (for episodic memory). Spatial representation of the environment is pivotal for navigation in rodents and primates. Two types of maps, topographical and topological, may be used for spatial representation. Rodent hippocampal place cells exhibit spatially-selective firing patterns in an environment that can be decoded to determine the animal’s location, heading, and past and future trajectory. We recorded ensembles of hippocampal neurons as rodents freely foraged in one and two-dimensional spatial environments, and we used a ``decode-to-uncover'' strategy to examine the temporally structured patterns embedded in the ensemble spiking activity in the absence of observed spatial correlates during rodent navigation. Specifically, the spatial environment was represented by a finite discrete state space. Trajectories across spatial locations (``states'') were associated with consistent hippocampal ensemble spiking patterns, which were characterized by a state transition matrix of a hidden Markov model. We incorporated informative structured priors and applied variational Bayesian inference. From the inferred state transition matrix, we derived a topology graph that defined the connectivity in the state space. In addition, we can conduct qualitative and quantitative assessment derived from the model. We also investigated the topographic versus topological contributions to spatial representation of hippocampal population codes. In contrast to a topographic code, our decoding analyses support the efficiency of topological coding in the presence of sparse sample size and fuzzy space mapping.
Finally, we present some discussions about (i) philosophical questions how the the brain interpret the world (from a reductionist perspective) using ensemble spikes alone; (ii) how this work is implied to analysis of other hippocampal-neocortical recordings and sleep-associated ensemble spike data.
We computationally model the effects of a type II topoisomerase strand passage on on a ring polymer. In doing so, we generate self-avoiding polygons (SAPs) on the simple cubic lattice through the use of composite Markov chain computer simulations. We investigate two specific strand passage structures, the theta-structure and the symmetric structure, to compare their limiting knot transition probabilities. We also show evidence that the probability of going from knot type K to knot type K#K' is independent of the initial knot type K.
We present a model for DNA packaging simulations in bacteriophages that includes the effect of the packaging motor; we also show packaging results from an implementation on the parallel computing platform OpenCL. We simulate the motor with a kinetic Monte Carlo algorithm that feeds the DNA into the virus and couple it with a damped coarse-grained molecular mechanics simulation of the DNA to show how twisting the DNA affects the overall chirality, or writhe, of the packaged DNA molecule.
Keywords of the presentation: neuroscience, correlation matrices, topological data analysis
Experimental neuroscience is undergoing a period of rapid progress in the collection of neural activity and connectivity data. This promises to allow more direct testing of a variety of theoretical ideas, and thus advance our understanding of "how the brain works." Detecting meaningful structure in neural data, however, remains a significant challenge. A major obstacle is that these data often measure quantities that are related to more "fundamental" variables by an unknown nonlinear transformation. This transformation obscures the underlying structure, diminishing the power of traditional linear algebra-flavored tools. Methods from computational topology, however, are often capable of detecting the hidden structure. We adapt these methods for the analysis of correlation matrices, and illustrate their use for testing the "coding space" hypothesis on neural data.
For an animal to successfully navigate its environment, it must form
an accurate internal representation of its surroundings.
Location-specific firing and co-firing of the hippocampal place cells
play a crucial role in spatial cognition, which according to our
recently published model is more akin to a subway map than a street
map, i.e., is primarily topological. We tested our topological model
of hippocampal activity, varying several parameters in computer
simulations of rat trajectories in distinct test environments. Using a
computational algorithm based on recently developed tools from
Persistent Homology theory, we find that the patterns of neuronal
co-firing can, in fact, convey topological information about the
environment in a biologically realistic length of time. Notably, our
simulations reveal a “learning region” —a sweet spot for spatial
learning—that highlights the interplay between parameters in producing
hippocampal states that are more or less adept at map formation. For
example, within the learning region a smaller number of neurons firing
can be compensated by adjustments in firing rate or place field size,
but beyond a certain point map formation begins to fail. This notion
of a learning region provides a coherent theoretical lens through
which to view conditions that impair spatial learning by altering
place cell firing rates or spatial specificity.
Our ability to navigate our environments relies on our ability to form
an internal representation of the spaces we?re in. Since the discovery
that certain hippocampal neurons fire in a location-specific way, we
have known that these ?place cells? serve a central role in forming
this internal spatial map, but how they represent spatial information,
and even what kind of information they encode, remains mysterious.
(Perhaps the cells form something akin to a street map, with distances
and angles, but they could also form something more akin to a subway
map, with a focus on connectivity.) We reasoned that, because
downstream brain regions must rely on place cell firing patterns alone
(they have no direct access to the environment), the temporal pattern
of neuronal firing must be key. Furthermore, because co-firing of two
or more place cells implies spatial overlap of their respective place
fields, a map encoded by co-firing should be based on connectivity and
adjacency rather than distances and angles, i.e., it will be a
topological map. Based on these considerations, we modeled hippocampal
activity with a computational algorithm we designed using methods
derived from Persistent Homology theory and algebraic topology. We
found not only that an ensemble of place cells can, in fact, ?learn?
the environment (form a topologically accurate map), but that it does
so within parameters of place cell number, firing rate, and place
field size that are uncannily close to the values observed in
biological experiments?beyond these parameters, this ?learning
region,? spatial map formation fails. Moreover, we find that the
learning region enlarges as we make the computational model more
realistic, e.g., by adding the parameter of theta precession. The
structure and dynamics of learning region formation provide a coherent
theoretical lens through which to view both normal spatial learning
and conditions that impair it.
This Fall 2013 I am teaching a free online course MATH:7450 (22M:305)
Topics in Topology: Scientific and Engineering Applications of
Algebraic Topology offered through the Mathematics Department and
Division of Continuing Education at University of Iowa.
Goal: To prepare students and other researchers for the IMA Thematic
Year on Scientific and Engineering Applications of Algebraic Topology,
but all interested participants are welcome
Target Audience: Anyone interested in topological data analysis
including graduate students, faculty, industrial researchers in
bioinformatics, biology, business, computer science, cosmology,
engineering, imaging, mathematics, neurology, physics, statistics,
If you are interested in helping to teach a similar course in the
spring, please let me know.
More information about the Fall 2013 course can be found at
The application of persistent homology to molecular sequence data was introduced by Chan et al., [PNAS 2013], where recombination rates in viral populations were estimated by computing Lp norms on barcode diagrams. It was shown that persistent homology provides an intuitive quantification of reticulate evolution in molecular sequence data by measuring deviations from tree-like additivity. While that approach has proved successful at capturing large scale patterns of reticulate evolution, the sensitivity for detecting specific reticulate events is much lower. Here we introduce an approach to imputing latent ancestors into the data that increases the quantitative signal from persistent homology, at the expense of obscuring the direct interpretation of topological loops as reticulate events. We observe that complexes built from this construction have a simple decomposition into squares, cubes, and higher dimensional hypercubes.
Keywords of the presentation: molecular symmetries, chirality, flexible molecules
Understanding molecular symmetries has many important applications in chemistry. Symmetry is used in interpreting results in crystallography, spectroscopy, and quantum chemistry, as well as in classifying molecules. Mirror image symmetry is particularly important in predicting reactions and designing new pharmaceutical products.
The group of rigid symmetries of a molecule known as the point group is widely used for classifying molecular symmetries. However, some molecules can rotate around particular bonds, and others are large enough to be somewhat flexible. Such molecules may have symmetries which are not included in the point group, and may even interconvert with their mirror image as a result of their flexibility. In this case, a topological approach to understanding molecular symmetries is more useful than a geometric one. This talk will present a survey of results about topological symmetries of molecular structures.
Keywords of the presentation: persistence, structure of neuron assemblies, cricket cercus
We use topological data analysis to investigate three dimensional spatial structure of the locus of afferent neuron terminals in cricket's Acheta domesticus terminal ganglion. Each afferent neuron innervates a filiform hair positioned on a cercus, a protruding appendage at the rear of the animal. The hairs transduce air motion to the neuron signal which is used by cricket to respond to the environment.
We stratify the hairs (and the corresponding afferent terminals) into classes depending on hair length, and a position. Our analysis uncovers significant structure in the relative position of these terminal classes, which suggests its functional relevance. Our method is well designed to handle significant experimental and developmental noise.
One common and substantial difficulty encountered when testing our conceptual understanding of neuroscience in the lab is that observable variables are often related to what we believe is "really happening" by some unknown nonlinear transformation. Such nonlinearities are difficult to analyze using traditional tools which rely on linear algebra. Here, we construct from a correlation matrix a filtered sequence of simplicial complexes which is necessarily invariant under monotonic transformations of the matrix entries. Using persistent homology to extract quantitative measures of these families, we show that certain (potentially hidden) correlation structures in the matrix entries -- such as those arising from distances in Euclidean spaces -- can be readily distinguished from random controls. As an application, we show that neural data from the rat hippocampus is consistent with the existence of a Euclidean "coding space" across a variety of behaviors.
The biomedical importance of small RNA molecules continues to grow. Yet, even at this length scale, reliably predicting the native base pairs remains a significant open problem. The ability to sample secondary structures efficiently from the Gibbs distribution yields a strong signal of high probability pairings. However, further analysis is needed to identify important correlations in these large data sets. RNA profiling is a new method which identifies the most probable combinations of base pairs across the ensemble of possible secondary structures. Our combinatorial approach is straightforward, stable, and clearly separates structural signal from thermodynamic noise.
Keywords of the presentation: regulatory network, Morse decomposition, global attractor, time series, Conley-Morse database
Complex network structure frequently appear in biological systems such as gene regulatory networks, circadian rhythm models, signal transduction circuits, etc. As a mathematical formulation of such biological complex network systems, Fiedler, Mochizuki and their collaborators (JDDE 2013) recently defined a class of ODEs associated with a finite digraph called a regulatory network, and proved that its dynamics on the global attractor can in principle be faithfully monitored by information from a (potentially much) fewer number of nodes called the feedback vertex set of the graph.
In this talk, I will use their theory to give a method for detecting a more detailed information on the dynamics of regulatory networks, namely the Morse decomposition of its global attractor. The main idea is to take time series data from the feedback vertex set of a regulatory network, and construct a combinatorial multi-valued map, to which we apply the so-called Conley-Morse Database method.
As a test example, we study Mirsky’s mathematical model for mammalian circadian rhythm which can be represented as a regulatory network with 21 nodes, and show that numerically generated time series data from its feedback vertex set consisting of 7 nodes correctly detect a Morse decomposition in the global attractor, including 1 stable periodic orbit, 2 unstable periodic orbits, and 1 unstable fixed point.
This is a joint work with B. Fielder, A. Mochizuki, G. Kurosawa, and H. Oka.
Keywords of the presentation: healthcare, patient sub-populations, biomarker, personalized medicine
Data has shape. Shape has meaning. I will discuss how Topological Data Analysis (TDA) has been applied to various biological problems such as identifying patient populations that might respond better to certain treatments, understanding the underlying etiology of a disease such as cancer and studying drug response at the single cell level.
Recent Nobel prizes highlight the contribution of computation in the field of biochemistry. The Big Data era approaches computational modeling by integrating classical physics and chemistry with data-driven computation. Using knowledge derived from the electrostatic models of Warshel and Levitt, or the dynamic simulations of Karplus, patterns in data lead to predictive models of molecular function and mechanism. We show how massive computation can enable costly computations across large protein data sets. Model optimization also shows dramatic speedup with the use of high-throughput machine learning. By exhaustive sampling of the model space, we explore its “shape” and pursue better methods for feature selection and model robustness.
Keywords of the presentation: hypothesis testing, disease subtypes, network topology
The past few years have witnessed the development of a range of mathematical data analysis approaches to understand large data. These approaches necessarily identify specific aspects of the topology and geometry of the data, and involve dimensionality reduction processes. I will discuss methods that combine multiple such approaches to underscore the topology of the data.
Applications to real data from disease will be discussed.
A Laplacian Matrix is a matrix in graph theory. Laplacian matrices have several important properties derived from its second eigenvalue which is defined as the algebraic connectivity. The notion of algebraic connectivity is part of a bioinformatics algorithm called RNAmute. In this poster we present theorems of Miroslav Fielder that are used to prove properties of the matrices. We then apply RNAmute to HIV-1 RNA sequences to predict possible mutations in the sequences.
*Joint work with Rudy Dehaney.
Keywords of the presentation: RNA structure prediction
After reviewing some basic properties of RNA, we show how the problem of RNA folding can be formulated in terms of a matrix field theory. As a consequence, RNA secondary structures can be classified according to their topological genus. After presenting some combinatorics results about RNA structure, we present an overview of the genus distribution of all experimentally known RNA structures. These concepts are used to design two powerful algorithms for the prediction of RNA structures with pseudoknots.
Data generated in such areas as medical imaging and evolutionary biology are frequently tree-shaped, and thus non-Euclidean in nature. As a result, standard techniques for analyzing data in Euclidean spaces become inappropriate, and new methods must be used. One such framework is the space of phylogenetic trees constructed by Billera, Holmes, and Vogtmann. This space is non-positively curved (hyperbolic), so there is a unique geodesic path (shortest path) between any two trees and a well-defined notion of a mean tree for a given set of trees. Furthermore, this geodesic path can be computed in polynomial time, leading to a practical algorithm for computing the mean and variance. We look at the mean and variance of distributions of phylogenetic trees that arise in tree inference, and compare with them with existing measures of consensus and variance.
Keywords of the presentation: Riemannian manifolds, Cartan connection, Computational anatomy
Computational anatomy is an emerging discipline at the interface of geometry, statistics, image analysis and medicine that aims at analysing and modelling the biological variability of the organs shapes at the population level. The goal is to model the mean anatomy and its normal variation among a population and to discover morphological differences between normal and pathological populations. For instance, the analysis of population-wise structural brain changes with aging in Alzheimer's disease requires first the analysis of longitudinal morphological changes for a specific subject. This can be evaluated through the non-rigid registration. Second, To perform a longitudinal group-wise analysis, the subject-specific longitudinal trajectories need to be transported in a common reference (using some parallel transport).
To reach this goal, one needs to design a consistent statistical framework on manifolds and Lie groups. The geometric structure considered so far was that of metric and more specially Riemannian geometry. Roughly speaking, the main steps are to redefine the mean as the minimizer of an intrinsic quantity: the Riemannian squared distance to the data points. When the Fréchet mean is determined, one can pull back the distribution on the tangent space at the mean to define higher order moments like the covariance matrix.
In the context of medical shape analysis, the powerful framework of Riemannian (right) invariant metric on groups of diffeomorphisms (aka LDDMM) has often been investigated for such analyses in computational anatomy. In parallel, efficient image registration methods and discrete parallel transport methods based on diffeomorphisms parameterized by stationary velocity fields (SVF) (DARTEL, log-demons, Schild's ladder etc) have been developed with a great success from the practical point of view but with less theoretical support.
In this talk, I will detail the Riemannian framework for geometric statistics and partially extend if to affine connection spaces and more particularly to Lie groups provided with the canonical Cartan-Schouten connection (a non-metric connection). In finite dimension, this provides strong theoretical bases for the use of one-parameter subgroups. The generalization to infinite dimensions would grounds the SVF-framework. From the practical point of view, we show that it leads to quite simple and very efficient models of atrophy of the brain in Alzheimer's disease. Learning what are the topological invariants of these noisy deformations fields is now the next step where computational topology has a role to play.
Biomolecular simulations that run on high performance computing (HPC) architectures generate petabytes of output. This data is too voluminous for typical analytic methods, so keeping humans in the loop is often appropriate for zero-th order analyses. The scenario presented here is for humans to view dynamic visualization synchronized to an ongoing simulation. Topological characteristics of the writhing molecules are important indicators of crucial events, so that novel algorithms to ensure ambient isotopic equivalence between the frames viewed and the underlying model are important.
Keywords of the presentation: Topology, Persistent Homology, Evolution, Genome, Virus
The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
Keywords of the presentation: cell complex, fatgraph, shape, genus reuction
In this talk we introduce the basic construction of topological RNA structures.
We introduce shapes and the associated shape polynomial and its connection to
RNA folding. We then establish the connection to unicellular maps and outline
the combinatorial constructions that facilitate genus induction. We furthermore
show applications of this framework to the uniform generation of RNA structures
of fixed topological genus and how to deal with RNA-RNA interaction structures.
Keywords of the presentation: MRI, fragile X syndrome, topological data analysis, generalized Reeb graph
Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common known inherited cause of developmental disability as well as the most common single-gene risk factor for autism. In this talk I will give a brief description of the algorithm used by Mapper/Iris to produce a Reeb-like graph representation associated to a dataset, and then describe how its application to structural MRI data for a population of children with FXS has led to the potential identification of higher and lower functioning subgroups within this population.
In evolutionary biology, the ancestry of individuals or species is typically depicted as a tree. Tree models assume that evolution proceeds clonally, meaning that each individual inherits genetic material from a single parent. Processes of recombination and hybridization violate this assumption, and so they are hard to detect using methods that start with a tree. Here we develop a more general, non-treelike model, based in persistent homology, to estimate the occurrence of both mutation and recombination in evolving populations. We then apply this model to cases of HIV evolution occurring within individual hosts. We find variation in recombination rate among individuals that may correspond to variation in HIV-related symptoms.
Keywords of the presentation: DNA topology, tangle method, topology simplification
Newly replicated circular chromosomes are topologically linked. Controlling these topological changes, and returning the chromosomes to an unlinked monomeric state is essential to cell survival. XerCD-dif-FtsK recombination acts in the replication termination region of the Escherichia coli chromosome to remove links introduced during replication. We use topological methods to show definitively that there is a unique shortest pathway of unlinking by XerCD-dif-FtsK that strictly reduces the complexity of the links at every step. We delineate the mechanism of action of the enzymes at each step along this pathway and provide a 3D interpretation of the results.Read More...
Keywords of the presentation: Polymer chain, DNA topology, DNA statistical properties
The talk will review the development in the field of DNA-related topological problems, knots and links formed by double-stranded DNA molecules. It will start from purely theoretical problem of calculating the equilibrium probability of knots in a polymer chain. Although solving this problem was an achievement in polymer statistical physics, it did not look useful for anything else at that time. Eventually, however, it helped greatly in the studies of DNA general properties and its topological transformations catalyzed by site-specific recombinases and DNA topoisomerases. Some examples of these applications will be briefly considered. The second half of the talk will be concentrated on an amazing property of type II DNA topoisomerases, the ability of these enzymes to reduce fractions of knots and links in circular DNA molecules below the level that corresponds to the thermodynamic equilibrium.
Keywords of the presentation: contour tree, landscape metaphor, energy landscape, visualization
The rapid improvements in computational power have enabled researchers to produce large amounts of molecular simulation data. Hence there is a pressing need to be able to analyze such data to enhance our understanding of molecular dynamics. However, given their massive size and typically high-dimensionality, it is hard to directly traverse and explore these data. In this talk, I will describe our recent work toward building a visualization platform to facilitate interactive exploration of the high-dimensional molecular simulation data. Our tools are based on a topological concept called the contour tree. Specifically, a set of molecular simulation data can be considered as a sample of the so-called protein energy landscape. Using the contour tree idea, we construct a two-dimensional terrain as a metaphor for the high-dimensional protein energy landscape. This two-dimensional terrain preserves certain topological information of the high dimensional landscape, and provides an intuitive environment where users can now easily inspect the high-dimensional data set.
This is joint work with W. Harvey, I-H. Park, C. Li, O. Rubel, V. Pascucci, and P.-T. Bremer.
A major feature of biological sciences in the 21st Century will be their transition from phenomenological and descriptive disciplines to quantitative and predictive ones. However, the emergence of complexity in self-organizing biological systems poses fabulous challenges to their quantitative description because of the excessively high dimensionality. A crucial question is how to reduce the number of degrees of freedom, while preserving the fundamental physics in complex biological systems. We discuss a multiscale multiphysics and multidomain paradigm for biomolecular systems. We describe macromolecular system, such as protein, DNA, ion channel, membrane, etc., by a number of approaches, including static atoms, molecular mechanics, quantum mechanics and elastic mechanics; while treating the aqueous environment as a dielectric continuum or electrolytic fluids. We use differential geometry to couple various microscopic and macroscopic domains on an equal footing. Based on the variational principle, we derive the coupled Poisson-Boltzmann, Nernst-Planck, Kohn-Sham, Laplace-Beltrami, Newton, elasticity and/or Navier-Stokes equations for the structure, dynamics and transport of protein, protein-ligand binding and ion-channel systems.
Implicit solvent methods, describing the biomolecules of interest in
discrete detail and taking a mean ﬁeld
approximation for solvent properties, have become popular for
interactions and solvation compu-
tation because of the reduction in degrees of freedom and the
cost for numerical simulations.
However, current computation of biomolecular solvation confronts many
fundamental limitations and severe
challenges, such as ad hoc assumptions about solvent-solute interfaces to
deﬁne some of the most important
components of the solvation model. We have developed novel geometric ﬂow
approaches to determine the
continuum-discrete interface and for the solvation analysis of small
compounds and biomolecules.
The intrinsic volumes generalize both Euler characteristic and volume, quantifying the “size” of a set in various ways.
Lifting the intrinsic volumes from sets to functions over sets, we obtain the Hadwiger Integrals, a family of integrals that generalize both the Euler integral and the Lebesgue integral.
The classic Hadwiger Theorem says that the intrinsic volumes form a basis for the space of all valuations on sets.
An analogous result holds for valuations on functions: with certain assumptions, any valuation on functions can be expressed in terms of Hadwiger integrals.
These integrals provide various notions of the size of a function, which are potentially useful for analyzing data arising from sensor networks, cell dynamics, image processing, and other areas.
This poster provides an overview of the intrinsic volumes, Hadwiger integrals, and possible applications.
We propose the flexibility-rigidity index (FRI) method for protein flexibility analysis. The FRI is accurate and efficient for the prediction of flexibility and fluctuation of macromolcules compared to similar tools such as GNM. The average correlation score for B-factor prediction for 365 structures is 0.661 for FRI vs. 0.565 for GNM. FRI scales with computational complexity as O(N), while others requiring matrix decomposition are approximately O(N3). FRI allows flexibility or rigidity to be visualized in either atomic discrete or atomic continuous representations of macromolecular structures. The continuous atomic rigidity from FRI is used in the multiscale modeling of continuum elasticity with atomic rigidity (CEWAR) and for visualization
Multiscale modeling is of paramount importance to the understanding of biomolecular structure, function, dynamics and transport. Geometric modeling provides structural representations of molecular data from the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB). Commonly used geometric models, such as molecular surface (MS), van der Waals surface, and solvent accessible surface are ad hoc devision of solvent and solute regions and lead to troublesome geometric singularities. At fundamental level, solvent and solute electron densities overlap each other and there is no sharp solvent-solute interface.We discuss our variational multiscale models and associated geometric modeling of biomolecular complexes, based on differential geometry of surfaces and geometric measure theory. Our models give rise to singularity-free surface representation, curvature characterization, electrostatic
mapping, solvation energy and binding affinity analysis of biomolecules.