March 5-9, 2007
Molecular phylogenetics is concerned with inferring evolutionary
relationships (phylogenetic trees) from biological sequences (such as
aligned DNA sequences for a gene shared by a collection of species).
The probabilistic models of sequence evolution that underly statistical
approaches in this field exhibit a rich algebraic structure.
After an introduction to the inference problem and phylogenetic
models, this talk will survey some of the highlights of current
algebraic understanding. Results on the important statistical issue
of identifiability of phylogenetic models will be emphasized, as the
algebraic viewpoint has been crucial to obtaining such results.
The relationship between the shape of a fitness landscape and the underlying gene interactions, or epistasis, has been extensively studied in the two-locus case. Epistasis has been linked to biological important properties such as the advantage of sex. Gene interactions among multiple loci are usually reduced to two-way interactions. Here, we present a geometric theory of shapes of fitness landscapes for multiple loci. We investigate the dynamics of evolving populations on fitness landscapes and the predictive power of the geometric shape for the speed of adaptation. Finally, we discuss applications to fitness data from viruses and bacteria.
Many statistical models of evolution can be viewed as
algebraic varieties. The generators of the ideal associated to a model
and a phylogenetic tree are called invariants. The invariants of an
statistical model of evolution should allow to determine what is the
tree formed by a set of living species.
We will present a method of phylogenetic inference based on invariants
and we will discuss why algebraic geometry should be considered as a
powerful tool for phylogenetic reconstruction. The performance of the
method has been studied for quartet trees and the Kimura 3-parameter
model and it will be compared to widely known phylogenetic
reconstruction methods such as Maximum likelihood estimate and
Chemical reaction network models give rise to
dynamical systems that are usually high dimensional,
have many unknown parameters. Due to the presence of these
parameters (such as reaction rate constants) direct numerical
simulation of the chemical dynamics is practically
the other hand, we will show that important properties of
systems are determined only by the network structure, and do
depend on the unknown parameters. Also, we will show how some
these results can be generalized to systems of polynomial
that are not necessarily derived from chemical kinetics. In
particular, we will point out connections with classical
in algebraic geometry, such as the real Jacobian conjecture.
talk describes joint work with Martin Feinberg, and can be
as a continuation of his earlier talk.
We work in the space of n-by-n real symmetric
matrices with the Frobenius inner product.
Consider the following problem:
Problem: Positive semi-definite
approximation. Given an n-by-n real symmetric matrix
A, find the positive semi-definite matrix
which is closest to A.
I discuss the following differential equation
in the space of symmetric matrices:
X′ = (A-X)X2 + X2(A-X) .
The corresponding flow preserves inertia.
In particular, if the initial value X(0)=M
is a positive definite matrix then X(t)
is positive definite for all t>0. I
show that the distance between A
and X(t) decreases as t increases.
I also show that if A has distinct
nonzero eigenvalues (which is a generic
condition) then the solution X(t)
converges to the positive semi-definite
matrix which is closest to A.
Many statistical hypotheses can be formulated in terms of polynomial equalities and inequalities in the unknown parameters and thus correspond to semi-algebraic subsets of the parameter space. We consider large sample asymptotics for the likelihood ratio test of such hypotheses in models that satisfy standard probabilistic regularity conditions. We show that the assumptions of Chernoff's theorem hold for semi-algebraic sets such that the asymptotics are determined by the tangent cone at the true parameter point. At boundary points or singularities, the tangent cone need not be a linear space such that non-standard limiting distributions may arise. Besides the well-known mixtures of chi-square distributions, such non-standard limits are shown to include the distributions of minima of chi-square random variables. Via algebraic tangent cones, connections to eigenvalues of Wishart matrices are found in factor analysis.
The Behrens-Fisher problem concerns testing the statistical hypothesis
of equality of the means of two normal populations with possibly
different variances. This problem
furnishes one of the simplest statistical models for which the likelihood
equations may have more than one real solution. In fact, with
probability one, the equations have either one or three real solutions.
Using the cubic discriminant, we study the large-sample probability of
one versus three solutions.
We introduce new methods for phylogenetic tree construction by using
machine learning to optimize the power of phylogenetic invariants.
Phylogenetic invariants are polynomials in the joint probabilities
which vanish under a model of evolution on a phylogenetic tree. We
give algorithms for selecting a good set of invariants and for
learning a metric on this set of invariants which optimally
distinguishes the different models. Our learning algorithms involve
semidefinite programming on data simulated over a wide range of
parameters. Simulations on trees with four leaves under the
Jukes-Cantor and Kimura 3-parameter models show that our method
improves on other uses of invariants and is competitive with
neighbor-joining. Our main biological result is that the trained
invariants can perform substantially better than neighbor joining on
quartet trees with short interior edges.
This is joint work with Yuan Yao (Stanford).
In nature there are millions of distinct networks of chemical reactions that might present themselves for study at one time or another. Written at the level of elementary reactions taken with classical mass action kinetics, each new network gives rise to its own (usually large) system of polynomial equations for the species concentrations. In this way, chemistry presents a huge and bewildering array of polynomial systems, each determined in a precise way by the underlying network up to parameter values (e.g., rate constants). Polynomial systems in general, even simple ones, are known to be rich sources of interesting and sometimes wild dynamical behavior. It would appear, then, that chemistry too should be a rich source of dynamical exotica.
Yet there is a remarkable amount of stability in chemistry. Indeed, chemists and chemical engineers generally expect homogeneous isothermal reactors, even complex ones, to admit precisely one (globally attractive) equilibrium. Although this tacit doctrine is supported by a long observational record, there are certainly instances of homogeneous isothermal reactors that give rise, for example, to multiple equilibria. The vast landscape of chemical reaction networks, then, appears to have wide regions of intrinsic stability (regardless of parameter values) punctuated by far smaller regions in which instability might be extant (for at least certain parameter values).
In this talk, I will present some recent joint work with Gheorghe Craciun that goes a long way toward explaining this landscape — in particular, toward explaining how biological chemistry "escapes" the stability doctrine to (literally) "make life interesting." A subsequent talk by Craciun will emphasize more mathematical detail.
The past decade has seen considerable interest in the reformulation of statistical models and methods for the analysis of contingency tables using the language and results of algebraic and polyhedral geometry. But as algebraic statistics has developed, new ideas have emerged that have changed how we view a number of statistical problems. This talk reviews some of these recent advances and suggests some challenges for collaborative research, especially those involving large scale databases.
A subspace arrangement is a union of a finite number of subspaces of a
vector space. We will discuss the importance of subspace arrangements first
as mathematical objects and now as a popular class of models for
We will then introduce some of new theoretical results that were motivated
from practice. Using these results we will address the computational issue
about how to extract subspace arrangements from noisy or corrupted data.
Finally we will turn to the importance of subspace arrangements by briefly
discussing the connections to sparse representations, manifold learning,