Institute for Mathematics and Its Applications
Talk abstract:
Felsenstein introduced the use of the bootstrap in phylogenetic analysis. Criticisms of the efficacy of this approach were addressed by Efron, Halloran and Holmes who pointed out some common misunderstandings of what the bootstrap is supposed to accomplish. They also pointed out that in this high dimensional problem, the bootstrap provides an asymptotic approximation to a P-value for the test of the hypothesis that an observed clade is not a proper clade of the true phylogeny. Their illustration uses the Escalante-Ayala data on 11 species of plasmodium (malaria) with 1620 loci on part of the genome. Their analysis is confined to 223 nonmonotypic loci, i.e. loci where the 11-dimensional vector had more than one distinct component. Of these 223 loci, there were 119 singletons or vectors which appeared only once. This implies that the observed vectors represent a coverage of about 45% of the probability distribution of nonmonotypic vectors. It suggests that in this highly discrete and discontinuous problem of phylogenetic analysis, the sample size is not sufficiently large for the asymptotic first approximation to be meaningful. A simulation on a synchronous model, resembling that of the plasmodium data, but exaggerating the low coverage to 30% shows that the bootstrap can be very overoptimistic in assessing the reliability of observed clades.
This is joint work with Susan Holmes.