<span class=strong>IMA Tea and Poster Session</span>
Monday, March 5, 2007 - 4:30pm - 6:30pm
- Geometry of Rank Tests
Jason Morton (University of California, Berkeley)
We investigate the polyhedral geometry of conditional probability and
undirected graphical models, developing new statistical procedures
called convex rank tests. The polytope associated to an undirected
graphical conditional independence model is the graph associahedron.
The convex rank test defined by the dual semigraphoid to the n-cycle
graphical model is applied to microarray data analysis to detect
periodic gene expression.
- Maximum Likelihood Estimation in Latent Class
Yi Zhou (Carnegie-Mellon University)
Latent class models have been used to explain the heterogeneity of the observed relationship among a set of categorical variables and have received more and more attention as a powerful methodology for analyzing discrete data. The central goal of our work is to study the existence and computation of maximum likelihood estimates (MLEs) for these models, which are cardinal for assessment of goodness of fit and model selection. Our study is at the interface between the fields of algebraic statistics and machine learning.
Traditionally, the expectation maximization (EM) algorithm has been applied to compute the MLEs of a latent class model. However, the solutions provided by the EM correspond to local maxima only, so, although we are able to compute them effectively, we still lack methods for assessing uniqueness and existence of the MLEs. Another interesting problem in statistics is the identifiability of the model. When a model is unidentifiable, it is necessary to adjust the number of degrees of freedom in order to apply correctly goodness-of-fit tests. In our work, we show that both the existence and identifiability problems are closely related to the geometric properties of the latent class models. Therefore, studying the algebraic varieties and ideals arising from these models is particularly relevant to our problem. We include a number of examples as a way of opening a discussion on a general method for addressing both MLE existence and identifiability in latent class models.
- Supervised Learning Artificial Neural Network Algorithms for Optimizing Mechanical Properties of Elastin-like Polypeptide Hydrogels for Cartilage Repair
Joint work with Dana L. Nettles3, Kimberly Trabbic Carlson3,
Ashutosh Chilkoti3, Lori A. Setton3,4, Mansoor A. Haider1,2
Elastin-like polypeptide (ELP) hydrogels are a class of biomaterials that
have potential utility as a biocompatible scaffold for filling defects due
to osteoarthritis and for regenerating cartilage. Because of the facility
to genetically engineer elastin sequence, there are almost endless
possible configurations of ELPs and conformations of the networks formed
after crosslinking. ELP biomaterial function will exhibit a complex
dependence on these polymer characteristics that impacts properties
expected to affect cartilage regeneration, such as mechanical load
support. These complex structure-function relationships for crosslinked
ELP hydrogels are not well described. A method for predicting the
mechanical properties of ELP hydrogels was developed based on structural
properties and Supervised Artificial Neural Network (ANN) modeling. The
ANN Model used concentration, molecular weight, crosslink density, and
sample number to predict the dynamic shear modulus and loss angle of the
hydrogels. The ANN was implemented in a custom compiled code based on the
Scaled Conjugate Gradient minimization algorithm and a Monte Carlo Method
was used to expand the dataset. The ANN was trained using a varying
subsets of the full dataset (22 formulations), with the complementary
subset used for validation. Trained networks demonstrated excellent
accuracy in prediction of hydrogel dynamic shear modulus at physiological
temperature, based on polymer design and predictions were robust with
respect to statistical variations. The results are used to show the
validity of an intermediate screening process using ANNs to obtain the
optimal mechanical properties for the ELP.
1 Biomathematics Graduate Program, North Carolina State University, 2
Department of Mathematics, North Carolina State University, 3 Department
of Biomedical Engineering, Duke University, 4 Department of Surgery, Duke
- Classifying Disease Models Using Regular Polyhedral Subdivisions
Debbie Yuster (Columbia University)
Genes play a complicated role in how likely one is to get a certain disease. Biologists would like to model how one's genotype affects their likelihood of illness. We propose a new classification of two-locus disease models, where each model corresponds to an induced subdivision of a point configuration (basically a picture of connected dots). Our models reflect epistasis, or gene interaction. This work is joint with Ingileif Hallgrimsdottir. For more information, see our preprint at arXiv:q-bio.QM/0612044.
- Multiple Solutions to the Likelihood Equations in the Behrens-Fisher Problem
Mathias Drton (University of Chicago)
The Behrens-Fisher problem concerns testing the statistical hypothesis
of equality of the means of two normal populations with possibly
different variances. This problem
furnishes one of the simplest statistical models for which the likelihood
equations may have more than one real solution. In fact, with
probability one, the equations have either one or three real solutions.
Using the cubic discriminant, we study the large-sample probability of
one versus three solutions.
- A Flow that Computes the Best Positive Semi-definite Approximation of a Symmetric Matrix
Kenneth Driessel (Iowa State University)
We work in the space of n-by-n real symmetric
matrices with the Frobenius inner product.
Consider the following problem:
Problem: Positive semi-definite
approximation. Given an n-by-n real symmetric matrix
A, find the positive semi-definite matrix
which is closest to A.
I discuss the following differential equation
in the space of symmetric matrices:
X′ = (A-X)X2 + X2(A-X) .
The corresponding flow preserves inertia.
In particular, if the initial value X(0)=M
is a positive definite matrix then X(t)
is positive definite for all t>0. I
show that the distance between A
and X(t) decreases as t increases.
I also show that if A has distinct
nonzero eigenvalues (which is a generic
condition) then the solution X(t)
converges to the positive semi-definite
matrix which is closest to A.
- Conditional Independence for Gaussian Random Variables is not Finitely Axiomatizable
Seth Sullivant (Harvard University)
It is known that for general distributions, there is no finite list of conditional independence axioms that can be used to deduce all implications among a collection of conditional independence statements. We show the same result holds among the class of Gaussian random variables by exhibiting, for each n>3, a collection of n independence statements on n random variables, which, in the Gaussian case imply that X_1 is independent of X_2, but such that no subset implies that X_1 is independent of X_2. The proof depends on the fact that conditional independence models for Gaussian random variables are algebraic varieties in the cone of positive definite matrices and makes use of binomial primary decomposition.
- Toric Ideals of Phylogenetic Invariants for the General Group-based Model on Claw Trees
We address the problem of studying the toric ideals of
phylogenetic invariants for a general group-based model on an
arbitrary claw tree. We focus on the group 2 and
choose a natural recursive approach that extends to other
groups. The study of the lattice associated with each
phylogenetic ideal produces a list of circuits that generate
the corresponding lattice basis ideal. In addition, we
describe explicitly a quadratic lexicographic Gröbner basis
of the toric ideal of invariants for the claw tree on an
arbitrary number of leaves. Combined with a result of Sturmfels
and Sullivant, this implies that the phylogenetic ideal of
every tree for the group 2 has a quadratic
Gröbner basis. Hence, the coordinate ring of the toric
variety is a Koszul algebra.
This is joint work with Julia Chifman, University of Kentucky.
- Metric Learning for Phylogenetic Invariants
Nicholas Eriksson (Stanford University)
We introduce new methods for phylogenetic tree construction by using
machine learning to optimize the power of phylogenetic invariants.
Phylogenetic invariants are polynomials in the joint probabilities
which vanish under a model of evolution on a phylogenetic tree. We
give algorithms for selecting a good set of invariants and for
learning a metric on this set of invariants which optimally
distinguishes the different models. Our learning algorithms involve
semidefinite programming on data simulated over a wide range of
parameters. Simulations on trees with four leaves under the
Jukes-Cantor and Kimura 3-parameter models show that our method
improves on other uses of invariants and is competitive with
neighbor-joining. Our main biological result is that the trained
invariants can perform substantially better than neighbor joining on
quartet trees with short interior edges.
This is joint work with Yuan Yao (Stanford).
- Linkage Problems and Real Algebraic Geometry
Thorsten Theobald (Johann Wolfgang Goethe-Universität Frankfurt)
Joint work with Reinhard Steffens.
Linkages are graphs whose edges are rigid bars, and they arise
as a natural model in many applications in computational
molecular biology and robotics. Studying linkages naturally
to a variety of questions in real algebraic geometry, such as:
- Given a rigid graph with prescribed edge lengths, how
embeddings are there?
- Given a 1-degree-of-freedom linkage, how can one
and compute the trajectory of the vertices?
From the real algebraic point of view, these questions are
of specially-structured real algebraic varieties. On the poster
exhibit some techniques from sparse elimination theory to
analyze these problems. In particular, we show that certain
bounds (e.g. for Henneberg-type graphs) naturally arise from
mixed volumes and Bernstein's theorem.
- Given a rigid graph with prescribed edge lengths, how