March 05-09, 2007
Supervised learning artifical neural network algorithms
for optimizing mechanical properties of elastin-like polypeptide
hydrogels for cartilage repair
March 05, 2007 4:30 pm - 6:30 pm
Joint work with Dana L. Nettles3, Kimberly Trabbic Carlson3,
Ashutosh Chilkoti3, Lori A. Setton3,4, Mansoor A. Haider1,2
Elastin-like polypeptide (ELP) hydrogels are a class of biomaterials that
have potential utility as a biocompatible scaffold for filling defects due
to osteoarthritis and for regenerating cartilage. Because of the facility
to genetically engineer elastin sequence, there are almost endless
possible configurations of ELPs and conformations of the networks formed
after crosslinking. ELP biomaterial function will exhibit a complex
dependence on these polymer characteristics that impacts properties
expected to affect cartilage regeneration, such as mechanical load
support. These complex structure-function relationships for crosslinked
ELP hydrogels are not well described. A method for predicting the
mechanical properties of ELP hydrogels was developed based on structural
properties and Supervised Artificial Neural Network (ANN) modeling. The
ANN Model used concentration, molecular weight, crosslink density, and
sample number to predict the dynamic shear modulus and loss angle of the
hydrogels. The ANN was implemented in a custom compiled code based on the
Scaled Conjugate Gradient minimization algorithm and a Monte Carlo Method
was used to expand the dataset. The ANN was trained using a varying
subsets of the full dataset (22 formulations), with the complementary
subset used for validation. Trained networks demonstrated excellent
accuracy in prediction of hydrogel dynamic shear modulus at physiological
temperature, based on polymer design and predictions were robust with
respect to statistical variations. The results are used to show the
validity of an intermediate screening process using ANNs to obtain the
optimal mechanical properties for the ELP.
1 Biomathematics Graduate Program, North Carolina State University, 2
Department of Mathematics, North Carolina State University, 3 Department
of Biomedical Engineering, Duke University, 4 Department of Surgery, Duke
Medical Center
Second ChancesMarch 05, 2007 3:40 pm - 4:10 pm
Second ChancesMarch 06, 2007 4:00 pm - 4:30 pm
Second ChancesMarch 07, 2007 11:30 am - 12:00 pm
Second ChancesMarch 08, 2007 4:00 pm - 4:30 pm
Second Chances and closing remarks.March 09, 2007 4:00 pm - 4:30 pm
Phylogenetic models: algebra and evolutionMarch 06, 2007 9:00 am - 9:50 am
Molecular phylogenetics is concerned with inferring evolutionary
relationships (phylogenetic trees) from biological sequences (such as
aligned DNA sequences for a gene shared by a collection of species).
The probabilistic models of sequence evolution that underly statistical
approaches in this field exhibit a rich algebraic structure.
After an introduction to the inference problem and phylogenetic
models, this talk will survey some of the highlights of current
algebraic understanding. Results on the important statistical issue
of identifiability of phylogenetic models will be emphasized, as the
algebraic viewpoint has been crucial to obtaining such results.
Gene interactions and the geometry of fitness landscapesMarch 07, 2007 10:30 am - 11:20 am
The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Epistasis has been linked to biological important properties such as the advantage of sex. Gene interactions among multiple loci are usually reduced to two-way interactions. Here, we present a geometric theory of shapes of fitness landscapes for multiple loci. We investigate the dynamics of evolving populations on fitness landscapes and the predictive power of the geometric shape for the speed of adaptation. Finally, we discuss applications to fitness data from viruses and bacteria.
Using algebraic geometry for phylogenetic reconstructionMarch 08, 2007 10:30 am - 11:20 am
Many statistical models of evolution can be viewed as
algebraic varieties. The generators of the ideal associated to a model
and a phylogenetic tree are called invariants. The invariants of an
statistical model of evolution should allow to determine what is the
tree formed by a set of living species.
We will present a method of phylogenetic inference based on invariants
and we will discuss why algebraic geometry should be considered as a
powerful tool for phylogenetic reconstruction. The performance of the
method has been studied for quartet trees and the Kimura 3-parameter
model and it will be compared to widely known phylogenetic
reconstruction methods such as Maximum likelihood estimate and
Neighbor-Joining.
Stability and instability in polynomial equations
arising from complex chemical reaction networks: some underlying
mathematicsMarch 05, 2007 2:50 pm - 3:40 pm
Chemical reaction network models give rise to
polynomial
dynamical systems that are usually high dimensional,
nonlinear, and
have many unknown parameters. Due to the presence of these
unknown
parameters (such as reaction rate constants) direct numerical
simulation of the chemical dynamics is practically
impossible. On
the other hand, we will show that important properties of
these
systems are determined only by the network structure, and do
not
depend on the unknown parameters. Also, we will show how some
of
these results can be generalized to systems of polynomial
equations
that are not necessarily derived from chemical kinetics. In
particular, we will point out connections with classical
problems
in algebraic geometry, such as the real Jacobian conjecture.
This
talk describes joint work with Martin Feinberg, and can be
regarded
as a continuation of his earlier talk.
A flow that computes the best positive semi-definite approximation of a symmetric matrixMarch 05, 2007 4:30 pm - 6:30 pm
We work in the space of n-by-n real symmetric
matrices with the Frobenius inner product.
Consider the following problem:
Problem: Positive semi-definite
approximation. Given an n-by-n real symmetric matrix
A, find the positive semi-definite matrix
which is closest to A.
I discuss the following differential equation
in the space of symmetric matrices:
X′ = (A-X)X2 + X2(A-X) .
The corresponding flow preserves inertia.
In particular, if the initial value X(0)=M
is a positive definite matrix then X(t)
is positive definite for all t>0. I
show that the distance between A
and X(t) decreases as t increases.
I also show that if A has distinct
nonzero eigenvalues (which is a generic
condition) then the solution X(t)
converges to the positive semi-definite
matrix which is closest to A.
Multiple solutions to the likelihood equations in
the Behrens-Fisher problem
March 05, 2007 4:30 pm - 6:30 pm
The Behrens-Fisher problem concerns testing the statistical hypothesis
of equality of the means of two normal populations with possibly
different variances. This problem
furnishes one of the simplest statistical models for which the likelihood
equations may have more than one real solution. In fact, with
probability one, the equations have either one or three real solutions.
Using the cubic discriminant, we study the large-sample probability of
one versus three solutions.
Likelihood ratio tests and singularitiesMarch 05, 2007 10:40 am - 11:30 am
Many statistical hypotheses can be formulated in terms of polynomial equalities and inequalities in the unknown parameters and thus correspond to semi-algebraic subsets of the parameter space. We consider large sample asymptotics for the likelihood ratio test of such hypotheses in models that satisfy standard probabilistic regularity conditions. We show that the assumptions of Chernoff's theorem hold for semi-algebraic sets such that the asymptotics are determined by the tangent cone at the true parameter point. At boundary points or singularities, the tangent cone need not be a linear space such that non-standard limiting distributions may arise. Besides the well-known mixtures of chi-square distributions, such non-standard limits are shown to include the distributions of minima of chi-square random variables. Via algebraic tangent cones, connections to eigenvalues of Wishart matrices are found in factor analysis.
Metric learning for phylogenetic invariants
March 05, 2007 4:30 pm - 6:30 pm
We introduce new methods for phylogenetic tree construction by using
machine learning to optimize the power of phylogenetic invariants.
Phylogenetic invariants are polynomials in the joint probabilities
which vanish under a model of evolution on a phylogenetic tree. We
give algorithms for selecting a good set of invariants and for
learning a metric on this set of invariants which optimally
distinguishes the different models. Our learning algorithms involve
semidefinite programming on data simulated over a wide range of
parameters. Simulations on trees with four leaves under the
Jukes-Cantor and Kimura 3-parameter models show that our method
improves on other uses of invariants and is competitive with
neighbor-joining. Our main biological result is that the trained
invariants can perform substantially better than neighbor joining on
quartet trees with short interior edges.
This is joint work with Yuan Yao (Stanford).
Stability and instability in polynomial equations
arising from complex chemical reaction networks: the big pictureMarch 05, 2007 1:30 pm - 2:20 pm
In nature there are millions of distinct networks of chemical reactions that might present themselves for study at one time or another. Written at the level of elementary reactions taken with classical mass action kinetics, each new network gives rise to its own (usually large) system of polynomial equations for the species concentrations. In this way, chemistry presents a huge and bewildering array of polynomial systems, each determined in a precise way by the underlying network up to parameter values (e.g., rate constants). Polynomial systems in general, even simple ones, are known to be rich sources of interesting and sometimes wild dynamical behavior. It would appear, then, that chemistry too should be a rich source of dynamical exotica.
Yet there is a remarkable amount of stability in chemistry. Indeed, chemists and chemical engineers generally expect homogeneous isothermal reactors, even complex ones, to admit precisely one (globally attractive) equilibrium. Although this tacit doctrine is supported by a long observational record, there are certainly instances of homogeneous isothermal reactors that give rise, for example, to multiple equilibria. The vast landscape of chemical reaction networks, then, appears to have wide regions of intrinsic stability (regardless of parameter values) punctuated by far smaller regions in which instability might be extant (for at least certain parameter values).
In this talk, I will present some recent joint work with Gheorghe Craciun that goes a long way toward explaining this landscape — in particular, toward explaining how biological chemistry "escapes" the stability doctrine to (literally) "make life interesting." A subsequent talk by Craciun will emphasize more mathematical detail.
Algebraic statistics and the analysis of contingency tables: Old wine in new bottles?March 09, 2007 3:00 pm - 3:50 pm
The past decade has seen considerable interest in the reformulation of statistical models and methods for the analysis of contingency tables using the language and results of algebraic and polyhedral geometry. But as algebraic statistics has developed, new ideas have emerged that have changed how we view a number of statistical problems. This talk reviews some of these recent advances and suggests some challenges for collaborative research, especially those involving large scale databases.
Subspace arrangements in theory and practiceMarch 09, 2007 1:30 pm - 2:20 pm
A subspace arrangement is a union of a finite number of subspaces of a
vector space. We will discuss the importance of subspace arrangements first
as mathematical objects and now as a popular class of models for
engineering.
We will then introduce some of new theoretical results that were motivated
from practice. Using these results we will address the computational issue
about how to extract subspace arrangements from noisy or corrupted data.
Finally we will turn to the importance of subspace arrangements by briefly
discussing the connections to sparse representations, manifold learning,
etc...
Bifurcations in coupled systemsMarch 06, 2007 1:30 pm - 2:20 pm
A coupled cell system is a collection of interacting dynamical systems.
Coupled cell models assume that the output from each cell is important and
that signals from two or more cells can be compared so that patterns of
synchrony can emerge. We ask: How much of the qualitative dynamics observed
in coupled cells is the product of network architecture and how much depends
on the specific equations? Speficially we study the structure of
synchrony-breaking bifurcations in these systems.
The ideas will be illustrated through a series of examples and theorems.
One example shows how a frequency filter / amplifier can be built from a
small three-cell feed forward network; and a second illustrates patterns
of synchrony in lattice dynamical systems. One theorem gives necessary
and sufficient conditions for synchrony in terms of network architecture;
and a second shows that synchronous dynamics may itself be viewed as
dynamics in a coupled cell system through a quotient construction.
Math Matters - IMA Public Lecture: Patterns Patterns Everywhere
March 07, 2007 7:00 pm - 8:00 pm
Regular patterns appear all around us: from vast geological formations to the ripples in a vibrating coffee cup, from the gaits of trotting horses to tongues of flames, and even in visual hallucinations. The mathematical notion of symmetry is a key to understanding how and why these patterns form. In this lecture Professor Golubitsky will show some of these fascinating patterns and explain how mathematical symmetry enters the picture.
Modelling with mass-action kinetics and beyondMarch 08, 2007 1:30 pm - 2:20 pm
Mass-action kinetics is a powerful tool to describe events created by collission of molecules or individuals in a well-mixed environment giving them locally the same probability to meet each other. Moreover this probability is only dependent on the concentration of the mutual partners.
Mass action systems can be found in chemistry, cell biology, but also game theory and economics. Mathematically this gives rise to dynamical systems of a special type, more specific of polynomial type. I will give an overview how this property can be used to determine different types of bifurcations, for example the ocurrence of bistability, or oscillations via a Hopf bifurcation. All tools will be borrowing methods from algebraic geometry. Finally I will give an outlook what usually goes wrong in the modelling part while using mass-action kinetics if biochemical or cellular molecular events are considered. Finally the talk ends with a fresh look on mass-action kinetics applied to a spatial setting.
Polynomial dynamical systems over finite fields, with
applications to modeling and simulation of biological networksMarch 07, 2007 9:00 am - 9:50 am
Time-discrete dynamical systems with a finite state space
have been used as models of biological systems since the
use/invention of cellular automata by von Neumann in his attempt to
model a self-replicating organism in the 1950s. More recently, they
have appeared as models of a variety of biological systems, from
gene regulatory networks to large-scale epidemiological networks. This
talk will focus on theoretical and computational results about
polynomial dynamical systems using tools from computational
algebra and algebraic geometry.
Generalized maximum likelihood estimates for exponential familiesMarch 06, 2007 10:30 am - 11:20 am
Exponential families underpin numerous models of statistics
and information geometry that have significant applications.
For a standard full exponential family, or its canonically convex
subfamily, if the corresponding likelihood function from a sample
has a maximizer t* then, by the maximum likelihood principle, the
data are judged to be generated by the probability measure P* from
the family that is parameterized by t*. Since the likelihood depends
on data only through their mean, in this way the mean is mapped to
P*. In a joint work with Imre Csiszar, Budapest, we study an
extension of this mapping, the generalized maximum likelihood
estimator. It assigns to each point of the space at which the
likelihood function is bounded above, a probability measure from
the closure of the family in variation distance. A detailed
description, complete characterization of domain and range, and
additional results will be presented, not imposing any regularity
assumptions.
Geometry of rank tests
March 05, 2007 4:30 pm - 6:30 pm
We investigate the polyhedral geometry of conditional probability and
undirected graphical models, developing new statistical procedures
called convex rank tests. The polytope associated to an undirected
graphical conditional independence model is the graph associahedron.
The convex rank test defined by the dual semigraphoid to the n-cycle
graphical model is applied to microarray data analysis to detect
periodic gene expression.
Toric ideals of phylogenetic invariants for the general group-based model on claw trees
March 05, 2007 4:30 pm - 6:30 pm
We address the problem of studying the toric ideals of
phylogenetic invariants for a general group-based model on an
arbitrary claw tree. We focus on the group
2 and
choose a natural recursive approach that extends to other
groups. The study of the lattice associated with each
phylogenetic ideal produces a list of circuits that generate
the corresponding lattice basis ideal. In addition, we
describe explicitly a quadratic lexicographic Gröbner basis
of the toric ideal of invariants for the claw tree on an
arbitrary number of leaves. Combined with a result of Sturmfels
and Sullivant, this implies that the phylogenetic ideal of
every tree for the group
2 has a quadratic
Gröbner basis. Hence, the coordinate ring of the toric
variety is a Koszul algebra.
This is joint work with Julia Chifman, University of Kentucky.
Information geometry and algebraic statisticsMarch 08, 2007 3:00 pm - 3:50 pm
Recent presentations of Information Geometry (IG), e.g. Amari and Nagaoka (2000), consider general statistical models and general sample spaces. However, the seminal discussion by Cenkov (transl. 1982) is based on finite sample spaces, as it is in Algebraic Statistics (AS).
This talk will first review basic IG from the point of view of AS. In the second part, it discusses the issue of computation IG quantities and presents a few examples.
Gaussian path diagramsMarch 08, 2007 9:00 am - 9:50 am
In the 1920's the geneticist Sewall Wright introduced a class of
Gaussian statistical models represented by graphs containing directed
and bi-directed edges, known as path diagrams. These models have been
used extensively in psychometrics and econometrics where they are
called structural equation models.
I will first describe the subclass of bow-free acyclic path diagrams, which have desirable statistical properties. I will then characterize a subclass of models that are characterized by their Markov properties. Lastly I will outline recent work aimed at characterizing non-Markovian constraints that may arise.
(This is joint work with Mathias Drton, Michael Eichler and Masashi Miyamura.)
Application of algebraic statistics for statistical disclosure
limitationMarch 06, 2007 3:00 pm - 3:50 pm
Statistical disclosure limitation applies statistical tools to the
problems of limiting sensitive information releases about individuals and
groups that are part of statistical databases while allowing for proper
statistical inference. The limited releases can be in a form of arbitrary
collections of marginal and conditional distributions, and odds ratios for
contingency tables. Given this information, we discuss how tools from
algebraic geometry can be used to give both complete and incomplete
characterization of discrete distributions for contingency tables. These
problems also lead to linear and non-linear integer optimization
formulations. We discuss some practical implication, and challenges, of
using algebraic statistics for data privacy and confidentiality problems.
Open problems in algebraic statisticsMarch 05, 2007 9:15 am - 10:05 am
This talk introduces five or six mathematical problems whose solution would likely be a significant contribution to the emerging interactions between algebraic geometry, statistics, and computational biology.
Conditional independence for Gaussian random variables is not finitely axiomatizable
March 05, 2007 4:30 pm - 6:30 pm
It is known that for general distributions, there is no finite list of conditional independence axioms that can be used to deduce all implications among a collection of conditional independence statements. We show the same result holds among the class of Gaussian random variables by exhibiting, for each n>3, a collection of n independence statements on n random variables, which, in the Gaussian case imply that X_1 is independent of X_2, but such that no subset implies that X_1 is independent of X_2. The proof depends on the fact that conditional independence models for Gaussian random variables are algebraic varieties in the cone of positive definite matrices and makes use of binomial primary decomposition.
Linkage problems and real algebraic geometry
March 05, 2007 4:30 pm - 6:30 pm
Joint work with Reinhard Steffens.
Linkages are graphs whose edges are rigid bars, and they arise
as a natural model in many applications in computational
geometry,
molecular biology and robotics. Studying linkages naturally
leads
to a variety of questions in real algebraic geometry, such as:
- Given a rigid graph with prescribed edge lengths, how
many
embeddings are there?
- Given a 1-degree-of-freedom linkage, how can one
characterize
and compute the trajectory of the vertices?
From the real algebraic point of view, these questions are
questions
of specially-structured real algebraic varieties. On the poster
we
exhibit some techniques from sparse elimination theory to
analyze these problems. In particular, we show that certain
bounds (e.g. for Henneberg-type graphs) naturally arise from
mixed volumes and Bernstein's theorem.
On phylogenetic trees – a geometer's view March 09, 2007 9:00 am - 9:50 am
I will discuss geometric methods of investigating phylogenetic trees. In a joint project with Weronika Buczynska we investigate projective varieties which are binary symmetric models of trivalent phylogenetic trees. They have Gorenstein terminal singularities and are Fano varieties. Moreover any two such varieties which are of the same dimension are deformation equivalent, that is, they are in the same connected component of the Hilbert scheme of the projective space whose coordinates are indexed by subsets of their leaves.
A combinatorial test for significant codivergence between cool-season
grasses and their symbiotic fungal endophytes March 09, 2007 10:30 am - 11:20 am
Symbioses of grasses and fungal endophytes constitute an
interesting model
for evolution of mutualism and parasitism. Grasses of all
subfamilies can
harbor systemic infections by fungi of the family
Clavicipitaceae. Subfamily
Poöideae is specifically associated with epichloë
endophytes (species
of Epichloë and their asexual derivatives, the
Neotyphodium
species) in
intimate symbioses often characterized by highly efficient
vertical
transmission in seeds, and bioprotective benefits conferred by
the
symbionts to their hosts. These remarkable symbioses have been
identified
in most grass tribes spanning the taxonomic range of the
subfamily. Here we
examine the possibility of codivergence in the phylogenetic
histories of
Poöideae and epichloë. We introduce a method of analysis to
detect significant codivergence even in the absence of strict
cospeciation, and to address problems in previously developed methods. Relative ages
of
corresponding cladogenesis events were determined from
ultrametric maximum
likelihood H (host) and P (parasite = symbiont) trees by
an algorithm
called MRCALink (most recent common ancestor link), an
improvement over
previous methods that greatly weight deep over shallow H and
P node
pairs.
We then compared the inferred correspondence of MRCA ages in
the H and
P trees to the spaces of trees estimated from 10,000 randomly
generated
H and P tree pairs. Analysis of the complete dataset, which
included a
broad host-range species and some likely host transfers
(jumps), did not
indicate significant codivergence. However, when likely host
jumps were
removed the analysis indicated highly significant codivergence.
Interestingly,
early cladogenesis events in the Poöideae corresponded to
early
cladogenesis
events in epichloë, suggesting concomitant origins of the
Poöideae and
this unusual symbiotic system.
This is joint work with C. L. Schardl, K. D. Craven, A.
Lindstrom, and
A. Stromberg.
Classifying disease models using regular polyhedral subdivisions
March 05, 2007 4:30 pm - 6:30 pm
Genes play a complicated role in how likely one is to get a certain disease. Biologists would like to model how one's genotype affects their likelihood of illness. We propose a new classification of two-locus disease models, where each model corresponds to an induced subdivision of a point configuration (basically a picture of connected dots). Our models reflect epistasis, or gene interaction. This work is joint with Ingileif Hallgrimsdottir. For more information, see our preprint at arXiv:q-bio.QM/0612044.
Maximum likelihood estimation in latent class
March 05, 2007 4:30 pm - 6:30 pm
Latent class models have been used to explain the heterogeneity of the observed relationship among a set of categorical variables and have received more and more attention as a powerful methodology for analyzing discrete data. The central goal of our work is to study the existence and computation of maximum likelihood estimates (MLEs) for these models, which are cardinal for assessment of goodness of fit and model selection. Our study is at the interface between the fields of algebraic statistics and machine learning.
Traditionally, the expectation maximization (EM) algorithm has been applied to compute the MLEs of a latent class model. However, the solutions provided by the EM correspond to local maxima only, so, although we are able to compute them effectively, we still lack methods for assessing uniqueness and existence of the MLEs. Another interesting problem in statistics is the identifiability of the model. When a model is unidentifiable, it is necessary to adjust the number of degrees of freedom in order to apply correctly goodness-of-fit tests. In our work, we show that both the existence and identifiability problems are closely related to the geometric properties of the latent class models. Therefore, studying the algebraic varieties and ideals arising from these models is particularly relevant to our problem. We include a number of examples as a way of opening a discussion on a general method for addressing both MLE existence and identifiability in latent class models.