|
Probability
and Statistics in Complex Systems: Genomics, Networks, and Financial
Engineering, September 1, 2003 - June 30, 2004
Abstracts:
October
20-24, 2003
Material
from Talks Group
Photo
Max
Alekseyev
(Department of Computer Science, University of California, San
Diego) maxal@cs.ucsd.edu
Genome
Halving Problem (poster session)
Joint
work with Pavel Pevzner.
Genome
Halving Problem is motivated by an evolution mechanism that
duplicates the entire genome. The result of such duplication,
so-called perfectly duplicated genome, contains two identical
copies of each chromosome. The genome then is a subject to reversal
and/or translocation rearrangement operations. For given rearranged
duplicated genome, Genome Halving Problem attempts to recover
its closest perfectly duplicated ancestor. Solution to this
problem is used as a building block for more sophisticated genome
rearrangement algorithms.
Genome
Halving Problem was first introduced and solved in a series
of papers by Nadia El-Mabrouk and David Sankoff. Their algorithm
is rather complex and, to the best of our knowledge, it was
never implemented as a computer program. In our work we present
a new simpler and more general algorithm for Genome Halving
Problem as well as its implementation in C++.

Lars
Arvestad (Stockholm Bioinformatics Center and Department
Numerical Analysis and Computer Science, Royal Institute of
Technology (KTH)) lars.arvestad@sbc.su.se
http://www.nada.kth.se/~arve
New
Methods for Estimating Amino Acid Replacement Rates (poster
session) Long Version with
Figure: pdf
ps
Two
new methods for estimating replacement rate matrices from protein
sequence alignments are presented and shown to perform better
than another recent method, Müller-Vingron's resolvent
method, in a variety of settings. Furthermore, the best method
is demonstrated to be robust on small datasets and practical
also on very large datasets of real data. Neither short nor
divergent sequence pairs have to be discarded, making the method
economical with data.

Anne
Bergeron (Département d'informatique de l'UQAM, Universite
du Quebec a Montreal) bergeron.anne@uqam.ca
Easy
Ways to Clear Hurdles (poster session)
pdf
ps

Guillaume
Bourque (Centre de Recherche Mathematiques, Universite
de Montreal) bourque@crm.umontreal.ca
A
Comparative Approach for Multiple Gene Network Inference Using
Time-Series Gene Expression Data
Long Version with Figure pdf
ps
Slides: html
pdf
ps
ppt
We
present a method for gene network inference and revision based
on time-series data. Gene networks are modeled using linear
differential equations and a generalized stepwise multiple linear
regression algorithm is used to recover the interaction coefficients.
Our system was design for the recovery of gene interactions
concurrently in many gene regulatory networks related by a graph
or a tree. Suppose we are studying a certain regulatory network
in different species of known phylogeny. We can think of the
different networks as being related to each other in that way
and use this information. Alternatively, we might be interested
in the development stages of this network or we could be studying
the same system but in different tissues related at a different
level. The idea is that, given gene expression data for each
species, or each stage of development, or each tissue, we seek
to recover each individual network while minimizing a cost based
on the differences along the edges of the graph or the tree.
We show how this comparative framework allows new insights and
facilitates the gene network inference process.

Fiona
Brinkman (Simon Fraser University, Burnaby, BC, Canada)
brinkman@sfu.ca
Analysis
of Horizontal Gene Transfers of Potential Relevance to Microbial
Virulence
Slides: html
pdf
ps
ppt
We
have been using genome-wide bioinformatic approaches to identify
horizontal gene transfers that are of interest for their potential
role in bacterial virulence and the evolution of pathogenic
microbes. Analyses of both bacteria-eukaryotic and bacteria-bacteria
gene transfers are summarized, revealing possible patterns in
the types of genes most often transferred between species. The
implications are discussed, both in the context of the evolution
of virulence and what is likely the most effective approach
to control infectious disease agents.

Steven
B. Cannon (Plant Biology Department, University of
Minnesota, St. Paul, MN 55108, USA.) cann0010@tc.umn.edu
Distinguishing
Orthologs from Paralogs by Integrating Comparative Genome Data
and Gene Phylogenies (poster session)
Background:
In eukaryotic genomes, most genes are members of gene families.
When comparing genes from two species, therefore, most genes
in one species will be homologous to multiple genes in the second.
This often makes it difficult to distinguish orthologs (separated
through speciation) from paralogs (separated by other types
of gene duplication). Combining phylogenetic relationships and
genomic position in both genomes helps to distinguish between
these scenarios. This kind of comparison can also help to describe
how gene families have evolved within a single genome that has
undergone polyploidy or other large-scale duplications, as in
the case of Arabidopsis thaliana and probably most plant genomes.
Results:
We describe a suite of programs called OrthoParaMap that makes
genomic comparisons, identifies syntenic regions, determines
whether sets of genes in a gene family are related through speciation
or internal chromosomal duplications, maps this information
onto phylogenetic trees, and infers internal nodes within the
phylogenetic tree that may represent local as opposed to speciation
or segmental duplication. We describe the application of the
software using three examples: the melanoma-associated antigen
(MAGE) gene family on the X chromosomes of mouse and human;
the 20S proteasome subunit gene family in Arabidopsis, and the
major latex protein gene family in Arabidopsis.
Conclusion:
OrthoParaMap combines comparative genomic positional information
and phylogenetic reconstructions to identify which gene duplications
are likely to have arisen through internal genomic duplications
(such as polyploidy), through speciation, or through local duplications
(such as unequal crossing-over). The software is freely available
at http://www.tc.umn.edu/~cann0010/Software.html
Joint
work with Georgiana May 1,2
and Nevin D. Young1,3.
1
Plant Biology Department, University of Minnesota, St. Paul,
MN 55108, USA
2 Ecology, Evolution, and Behavior Department, University
of Minnesota, St. Paul, MN 55108, USA
3 Plant Pathology Department, University of Minnesota,
St. Paul, MN 55108, USA

Dimitra
Chalkia (Department of Biology, The Pennsylvania
State University, University Park, USA) duc136@psu.edu
Phylogenetic
Analysis of Formin Homology Proteins in Arabidopsis Thaliana
and Oryza Sativa (poster session)
Joint
work with Tatiana Bibikova, Simon
Gilroy, Wojciech Makalowski.
The
plant cell cytoskeleton plays an important role in many cellular
processes, including cell polarity establishment and cytokinesis.
Proteins that regulate cytoskeletal assembly are likely to be
a part of the signaling cascade that governs plant cell morphogenesis.
Formins are members of a large protein family that is defined
by the presence of the highly conserved Formin Homology II (FH2)
domain. In a wide range of organisms, including vertebrates,
arthropods, nematodes and fungi, formins have been implicated
in the regulation of cytoskeletal assembly and in the control
of cytokinesis and cell polarity establishment and maintenance.
The genomes of Arabidopsis thaliana and Oryza sativa contain
putative formin-like proteins based on the presence of an FH2
domain. Arabidopsis thaliana formins have been tentatively sub-divided
into two clades: Type I and Type II, based on the FH2 domain
alignment. We have extended this analysis to cover both Arabidopsis
and rice and have provided an evolutionary context for these
plant formin families.Our phylogenetic analysis shows that formins
are divided in two distinct clades in plants. This phylogenetic
clustering is also supported by the stuctrural features of these
proteins. This division of plant formins in two distinctive
groups seems to predate the split of monocots/eudicots. The
detailed evolutionary relationships of plant formins remain
unclear. The placement of fungi formins at the basal position
of the tree is in accordance with the most recent proposed phylogenetic
scheme for eukaryotes. Animal and plant formins cluster together,
and split into two major groups. This clustering may suggest
that their last common ancestor had already at least two different
types of formins.

Avril
Coghlan (Department of Genetics, Trinity College
Dublin, Ireland) avril.coghlan@ucd.ie
Origins
of Recently Gained Introns in Caenorhabditis
Slides:
html
pdf
ps
ppt
Joint
work with Kenneth H. Wolfe.
The
genomes of the nematodes Caenorhabditis elegans and C. briggsae
both contain about 100,000 introns, of which about 6000 are
unique to one species or the other. To study the origins of
new introns, we used a rigorous method involving phylogenetic
comparisons to animal orthologs and other nematode paralogs
to identify cases where an intron content difference between
C. elegans and C. briggsae was almost certainly caused by intron
insertion rather than deletion. We identified 57 putative recently
gained introns in C. briggsae and 112 in C. elegans. Novel introns
have a stronger exon splice site consensus sequence than the
general population of introns, and they show the same preference
for phase 0 sites in codons over phases 1 and 2 as seen in the
general population. More of the novel introns are inserted in
genes that are expressed in the germline than expected by chance.
As compared to matched control sets of C. briggsae introns,
the novel introns in C. briggsae are more likely to contain
an annotated repeat element (1.7-fold; P = 0.011), and the ends
of the intron are more likely to be close to the ends of the
repeat element (1.5-fold; P = 0.029). Similar but weaker trends
are also seen in C. elegans novel introns. One family of C briggsae
repeat elements, which is related to the Helitron class of putative
nonautonomous transposons, is found in significantly more novel
introns than reference introns (P < 1e-05). These results support
the hypothesis that novel introns originate as a result of transposable
element insertions into proto-splice site consensus sites in
germline-expressed genes.

Ramana
V. Davuluri (Human Cancer Genetics Program, Comprehensive
Cancer Center, Department of Molecular Virology, Immunology
& Medical Genetics, The Ohio State University, 420 W 12th Avenue,
TMRF 524, Columbus, OH 43210, USA) davuluri-1@medctr.osu.edu
Mammalian
Promoter Database: A Computational Platform for Comparative
Genomics of Mammalian Transcriptional Regulation (poster
session)
Joint
work with Hao Sun, Saranyan
K. Palaniswamy, Twyla T. Pohar,
and Victor Jin.
Transcription
in mammalian cells is a highly complex process that involves
multiple layers of general and gene-specific transcription factors.
Although extensive molecular research has been providing important
details about several transcription factors and their binding
sites in the target gene promoters, the information generated
over the years is highly fragmented. In order to better integrate
this vast amount of information with the genome sequences, we
have developed a new database called MPromDb (Mammalian Promoter
Database), an information resource of mammalian gene regulatory
regions. MPromDb (Version 1.0) contains 28,306 experimentally
supported and 32,121 computationally annotated promoters, and
mapping of 4,231 experimentally known binding sites, with links
to published literature. Each promoter sequence in MPromDb is
presented in the form of an image map with annotations of first
exon, cis-regulatory elements and plots of CpG scores, with
interactive contextual menus for easy navigation. MPromDb provides
a platform for comparative genomics of transcriptional regulation,
since promoters of orthologous genes are linked with each other
and displayed in the same record. The current version contains
9,331 human-mouse orthologous pairs. The database can be searched
for promoter sequences, transcription factors, and their direct
target genes, through a user-friendly web interface at http://bioinformatics.med.ohio-state.edu/MPromDb.

Dannie
Durand (Departments of Biological Sciences and Computer
Science, Carnegie Mellon University) durand@cmu.edu
Gene
Clusters in Comparative Genomics: Accident or Design? Slides:
pdf
Large scale gene duplication, the duplication of whole genomes
and subchromosomal regions, is a major force driving the evolution
of genetic functional innovation. Whole genome duplications
are widely believed to have played an important role in the
evolution of the maize, yeast and vertebrate genomes. Two or
more linked clusters of similar genes found in distinct regions
on the same genome are often presented as evidence of large
scale duplication. However, as the gene order and the gene complement
of duplicated regions diverge progressively due to insertions,
deletions and rearrangements, it becomes increasingly difficult
to distinguish remnants of common ancestral gene order from
coincidental similarities in genomic organization. In this talk,
I present computational approaches to validating gene clusters
in comparative genomics.

Evan
Eichler (Department of Genetics, Case Western Reserve
University) eee@po.cwru.edu
Recent
Segmental Duplications and the Fragile Breakage Model of Human
Genome Evolution
It
has been estimated that 5% of the human genome consists of interspersed
duplicated material that has arisen over the last 30-40 million
years of evolution. A large proportion of these duplications
exhibits an extraordinarily high degree of sequence identity
at the nucleotide level (>95%) and are interspersed over large
genomic distances (>1 Mb). The distribution of these duplications
is non-random in the human genome. Through processes of non-allelic
homologous recombination, these same regions are targets for
rapid evolutionary turnover creating hotspots of mammalian chromosomal
evolution and sites of genomic instability associated with disease
within the human population. Preliminary analyses have suggested
that the amount of segmental duplication may be a relatively
unique property of our genome. We have developed systematic
experimental and computational tools to examine duplication
content from human and other sequenced vertebrate species. An
analysis of the breakpoints of these duplications shows a significant
enrichment of Alu-repeat elements, providing new insight into
their mechanism of origin and preeminence within the primate
genome. In additions based on our analysis of syntenic breakpoints
between the mouse and human genome, we find that 25% (122/461)
of mouse-human synteny breakpoints contain 10 kb of duplicated
sequence. This association is highly significant (P<0.0001)
when compared to a simulated random breakage model. These data
support a non-random model of chromosomal evolution that implicates
a predominance of both small-scale duplication and large-scale
evolutionary rearrangements within specific regions of the human
genome. Such properties should be considered when trying to
reconstruct the evolutionary history of mammalian genomes.

Nadia
El-Mabrouk (Departement of Computer Science, University
of Montreal) mabrouk@IRO.UMontreal.CA
Reconstructing
the Ancestor of a Modern Genome with Multigene Families
Slides: html
pdf
ps
ppt
Given
a particular model of evolution and an optimization criterion,
the problem is to recover an ancestor of a modern genome modeled
as an ordered sequence of signed genes. One direct application
is to infer gene orders at the ancestral nodes of a phylogenetic
tree. Implicit in the rearrangement literature is that each
gene is present exactly once in each genome. This hypothesis
is clearly unguaranteed for divergent species containing several
copies of highly paralogous and orthologous genes. In this presentation,
we consider models of genome evolution that take multigene families
into account.
We
first present a genome-wide doubling event. Genome duplication
is an important source of new gene functions and novel physiological
pathways. Originally (ancestrally), a duplicated genome contains
two identical copies of each chromosome, but through genomic
rearrangements, this simple doubled structure is disrupted.
At the time of observation, each of the chromosomes resulting
from the accumulation of rearrangements can be decomposed into
a succession of conserved segments, such that each segment appears
exactly twice in the genome. We present exact algorithms for
reconstructing the ancestral doubled genome in linear time,
minimizing the number of inversions and/or translocations required
to derive the observed order of genes along the present-day
chromosomes.
The
second part of the presentation will concern a model of duplications
at a regional level. In this model, chromosomal regions (one
or more genes) are duplicated from one location of the genome
to another. Studies from human genomic sequence indicate that
many of these segments have been duplicatively transposed in
very recent evolutionary time. The implicit hypothesis is that
a genome with multigene families has an ancestor containing
exactly one copy of each gene that has evolved through a series
of duplication transpositions and substring inversions. We present
an algorithm for reconstructing an ancestral genome giving rise
to the minimal number of duplication transpositions and reversals.
We then show how to use this algorithm to recover gene orders
at the ancestral nodes of a phylogenetic tree.

Allan
G. Force (Benaroya Research Institute at Virginia
Mason) force@benaroyaresearch.org
Origin
of Subfunctions and Modular Genes
Slides: html
pdf
ps
ppt
Evolutionary
explanations for the origin of modular genetic and developmental
pathways almost always invoke some sort of long-term selective
advantage, e.g., as a functional prerequisite to the evolution
of phenotypic complexity or as an enhancer of evolvability.
However, simple theoretical results demonstrate that even in
the absence of any direct selective advantage, genetic modularity
can spontaneously emerge through the acquisition of new gene
subfunctions. Provided that population size is sufficiently
small, random genetic drift and mutation can conspire to produce
changes in the underlying genetic architecture of a species
without necessarily altering the phenotype. Extensive genetic
modularity may then accrue in a near-neutral fashion in permissive
population- genetic environments, potentially opening novel
pathways to morphological evolution. These results provide additional
support for the proposition that many aspects of gene and genome
complexity in multicellular eukaryotes may have arisen passively
as population size reductions accompanied an increase in organism
size, with the adaptive exploitation of such complexity occurring
secondarily.

Anant
Godbole (Mathematics Department, East Tennessee State
University) godbolea@mail.etsu.edu
Distributional
Approximations in Genome Reconstruction (poster
session) pdf
ps

Steve
Goldstein
(Laboratory for Molecular and Computational Genomics, University
of Wisconsin-Madison) steveg@lmcg.wisc.edu
Graph
Compression Algorithms for Efficiently Comparing Genomes (poster
session)
Joint
work with Adam Briska, Shiguo
Zhou, and David C. Schwartz.
Optical
Mapping is a system capable of producing genome-wide ordered
restriction maps. Such a restriction map provides a description
of an organism's genome, a description not unlike the sequence
of the genome, albeit at a coarser resolution. Just as comparisons
of whole genome sequences are leading to an exciting array of
biological advances, comparisons of optical maps will provide
a wealth of valuable information.
Now
that optical mapping has entered the high-throughput era, there
is a need for software to compare restriction maps of closely
related organisms. We present an algorithmic framework for this
task, closely modeled after DNA sequence comparison algorithms.
The major challenge lies in adapting the exact matching phase
of the sequence algorithms to handle the imprecision inherent
in determining restriction fragment lengths. Our graph-based
approach not only overcomes this challenge, but also can be
applied to sequence algorithms, providing advantages over suffix-tree
approaches.

Josefa
González (Departament de Genètica i Microbiologia,
Universitat Autònoma de Barcelona, 08193, Bellaterra (Barcelona),
Spain) icgm2@blues.uab.es
Duplicative
and Conservative Transpositions of the Larval Serum Protein
1 Genes in the Genus Drosophila (poster session)
Joint
work with Ferran Casals and Alfredo
Ruiz.
In
the genus Drosophila, homologous chromosomal elements show a
remarkable conservation of gene content but not of gene order,
indicating that paracentric inversions are the most common kind
of genomic change. Detailed physical maps of chromosomes X,
2 and 4 of Drosophila repleta and D. buzzatii, both belonging
to the Drosophila subgenus, were constructed and their gene
rearrangements compared with the homologous chromosomes in D.
melanogaster. We estimated that 393 paracentric inversions have
been fixed in the whole genome since the divergence between
D. repleta and D. melanogaster, that amounts to an average rate
of 0.053 disruptions/Mb/myr. Only two exceptions to the chromosomal
homologies were found and we have further analyzed one of them:
the transposition of the Larval serum protein 1 (Lsp1) genes.
Comparative molecular analysis of the transposed genes and their
flanking regions can help to elucidate the time, direction and
mechanism of gene transposition. In the D. melanogaster genome,
three Lsp1 ge es, alpha, beta and gamma, are present and each
is located on a different chromosome. We have characterized
the molecular organization of Lsp1 genes in D. buzzatii and
in D. pseudoobscura, a species of the Sophophora subgenus. Our
results show that only two Lsp1 genes (beta and gamma) exist
in these two species suggesting that the duplicative transposition
generating Lsp1alpha, took place <30 myr ago in the D. melanogaster
lineage. D. buzzatii and D. pseudoobscura show the same chromosomal
localization and genomic organization, different from that of
D. melanogaster for the Lsp1beta and Lsp1gamma genes. Thus we
conclude that this is likely to be the ancestral organization
and both genes must have conservatively transposed in the D.
melanogaster lineage <30 myr ago. Finally, the duplicative transposition
which gave rise to Lsp1beta and Lsp1gamma must have ocurred
before the divergence of the three Drosophila species (40-62
myr ago). Overall, at least two duplicative and two conservative
transpositions are necessary to explain the present chromosomal
distribution of Lsp1 genes in the three Drosophila species.
In D. buzzatii and D. pseudoobscura, Lsp1beta and Lsp1gamma
are localized close to snRNA or tRNA genes. RNA genes have been
implied in the origin of chromosomal rearrangements in prokaryotes
and yeasts and we find clear evidence for a role of snRNA genes
in the transposition of Lsp1beta genes in Drosophila. Analysis
of the 5' non coding regions of the Lsp1beta and Lsp1gamma genes
has led to identify the putative cis-acting regulatory regions
of these genes which seemingly transposed along with the coding
sequences.

Roderic
Guigó (Institut Municipal d'Investigacio Medica
(IMIM/UPF/CRG)) rguigo@imim.es
Comparative
Gene Prediction
Slides:
html
pdf
ps
ppt
Comparative
genomics is emerging as a powerful tool to characterize complex
genomes. Gene prediction, in particular, has benefited from
the availability of genome sequences from organisms across the
whole eukaryotic spectrum. The comparison of the human and mouse
genome sequences, for instance, has contributed substantially
to refine the gene content of the human (and mouse) genomes.
In my talk, I will stress how comparative genomes may be particularly
useful to identify genes which deviated from the standard characteristics,
and that, for this reason, may escape identification by other
means.

Tzvika
Hartman (Department of Computer Science and Applied
Mathematics, Weizmann Institute of Science, Rehovot, Israel)
tzvi@wisdom.weizmann.ac.il
A
Simpler 1.5-Approximation Algorithm for Sorting by Transpositions
Extended Version: pdf
ps
Joint
work with Ron Shamir (School of
Computer Science, Tel-Aviv University, Tel-Aviv, Israel).
In
this work we study the problem of sorting by transpositions.
First, we prove that the problem of sorting circular permutations
by transpositions is equivalent to the problem of sorting linear
ones. Hence, all algorithms for sorting linear permutations
by transpositions can be used to sort circular permutations.
Then, we derive our main result: A new quadratic 1.5-approximation
algorithm, which is considerably simpler than the extant algorithms
of Bafna and Pevzner (1998) and Christie (1999). Thus, the algorithm
achieves running time which is equal to the best known, with
the advantage of being much simpler. Moreover, the analysis
of the algorithm is significantly less involved, and provides
a good starting point for studying related open problems.

Elizabeth
Ann Housworth (Departments of Mathematics and Biology,
Indiana University) ehouswor@indiana.edu
Measures
of Conserved Synteny
Slides: html
pdf
ps
ppt
Measures
of conserved synteny are important for estimating the relative
rates of chromosomal evolution in various lineages. We present
a natural way to view the synteny conservation between two species
from an Oxford grid--an r x c table summarizing the number of
orthologous genes on each of the chromosomes 1 through r of
the first species that are on each of the chromosomes 1 through
c of the second species. This viewpoint suggests a natural statistic,
which we call syntenic correlation, designed to measure the
amount of synteny conservation between two species. This measure
allows syntenic conservation to be compared across many pairs
of species. We also discuss incorporating the dependency of
the numbers of orthologues observed in the chromosome pairings
between the two species into the estimates of the true number
of conserved syntenies given the observed number of conserved
syntenies.

Jens
Lagergren (SBC (Stockholm Bioinformatics Center),
& KTH (Kunliga Tekniska Högskolan) jensl@nada.kth.se
http://www.nada.kth.se/~jensl/
Bayesian
Gene/Species Tree Reconciliation and Orthology Analysis Using
MCMC
Comparative genomics in general and orthology analysis in particular
are becoming increasingly important parts of gene function prediction.
Previously, orhtology analysis and reconciliation has been performed
only with respect to the parsimony model. This discards many
plausible solutions and sometimes precludes finding the correct
one. In many other areas in bioinformatics probabilistic models
have proven to be both more realistic and powerful than parsimony
models.
We introduce a probabilistic gene evolution model based on a
birth-death process in which a gene tree evolves "inside"
a species tree. Based on this model, we develop a tool with
the capacity to perform practical orthology analysis, based
on Fitch's original definition, and more generally for reconciling
pairs of gene and species trees. Our gene evolution model is
biologically sound and intuitively attractive. We develop a
Bayesian analysis based on MCMC which facilitates approximation
of an a posteriori distribution for reconciliations. That is,
we can find the most probable reconciliations and estimate the
probability of any reconciliation, given the observed gene tree.
This also gives a way to estimate the probability that a pair
of genes are orthologs. To the best of our knowledge, this is
the first successful introduction of this type of probabilistic
methods, which flourish in phylogeny analysis, into reconciliation
and orthology analysis.
The MCMC algorithm has been implemented and performs very well
on synthetic as well as biological data. Using standard correspondences,
our results carry over to allele trees as well as biogeography.

Bret
Larget (Departments of Statistics and of Botany,
University of Wisconsin - Madison) larget@stat.wisc.edu
A
Statistical Approach to the Estimation of Phylogeny from Genome
Arrangements
Slides: pdf
The
determination of evolutionary relationships is a fundamental
problem in evolutionary biology. Genome arrangement data offers
a source of information for estimating phylogenetic trees that
may be especially useful for distantly related species. A statistical
approach to phylogenetic information is concerned with assessment
of uncertainty in estimated phylogenetic trees. We describe
a Bayesian framework for phylogenetic inference from genome
arrangement data using Markov chain Monte Carlo and discuss
our results on several data sets.

Emmanuelle
Lerat (Department Ecology and Evolutionary Biology,
University of Arizona) lerat@email.arizona.edu
Lateral
Gene Transfers and Organismal Phylogeny in Bacteria: Implications
for Ancestral Genome Reconstruction
Genome
reconstruction is of particular interest from a biological perspective.
This knowledge can illuminate the history of events that led
to the present contents and organization of genomes. The principle
of reconstruction methods is the inference of rearrangements
that occurred during the history of the genome. This makes the
strong assumption that genes are faithfully transmitted with
their genome through generations. However in bacteria, lateral
(or horizontal) gene transfers (LGT) are known to be very numerous.
LGT might be an obstacle in the attempt to establish genome
history, because some homologous genes may be transmitted between
different species. It has even been argued that LGT may prevent
the establishment of organismal relationships based on individual
gene phylogenies. Thus to reconstruct ancestral genomes in bacteria
seems to be particularly hazardous unless LGT is taken into
account. It is therefore very important to test the hypothesis
of vertical transmission of the genes used in genome reconstruction.
In order to determine the impact of LGT on the potential organismal
phylogeny, an approach to multigene phylogeny using complete
genomes is necessary to identify the genes that have been, without
ambiguity, vertically transmitted and that are thus good candidates
to be used in genome reconstruction. This will allow a real
biological interpretation of the genome reconstruction but also
facilitate the reconstruction itself.

Michael
Lynch (Department of Biology, Indiana University,
Bloomington, IN) mlynch@bio.indiana.edu
http://www.bio.indiana.edu/facultyresearch/faculty/Lynch.html
The
Origins of Genome Complexity
Slides: html
pdf
ps
ppt
Complete
genomic sequences from diverse phylogenetic lineages reveal
striking increases in genome complexity across the prokaryote
to unicellular eukaryote to multicellular eukaryote boundaries.
The changes include gradual growth in gene number resulting
from the retention of duplicate genes, more abrupt increases
in the abundance of spliceosomal introns and mobile genetic
elements, and enhanced modularity of gene regulation. A case
can be made that many of these changes emerged passively in
response to substantial long-term population-size reductions
that accompanied increases in organism size and magnified the
power of random genetic drift. Under this model, much of the
restructuring of eukaryotic genome organization and the roots
of many aspects of organismal complexity were initiated by nonadaptive
processes. Although the mutational changes necessary for genomic
modification are initiated by molecular processes, the population-genetic
environment ultimately defines the permissible paths of evolution.
The simple genomes of most microbial species can be understood
in this context, without invoking direct selection for streamlined
genomes, and direct selection for complexity need not be invoked
to explain genomic expansion in multicellular species.

Robert
(Bob) Mau (Departments of Animal Health and Biomedical
Sciences/Oncology University of Wisconsin-Madison) robertm@genome.wisc.edu
Inferring
Orthologous Regions via a Pseudo-Gibbs Sampler: Finding the
Pieces of the Rearrangement Puzzle (poster
session) pdf
doc
Joint
work with Aaron Darling, Frederick
R. Blattner, and Nicole T. Perna1.

Aoife
McLysaght (Department of Genetics, Trinity College
Dublin, Ireland) amclysag@uci.edu
Poxviruses
and Adaptive Genome Evolution
Slides: html
pdf
ps
ppt
We
used complete sequence from twenty poxviruses to investigate
the evolution of these virus genomes. We examined the pattern
of genome content and genome arrangement evolution in the context
of the virus phylogeny. We also examined the patterns of positive
selection acting on genes in these genomes. We show that the
rate of genome evolution is not constant over time, and that
it may be possible to relate patterns of genome evolution and
adaptive evolution acting on genes.

Daniel
P. Miranker (Department of Computer Sciences, University
of Texas - Austin) miranker@cs.utexas.edu
Application
of MoBIoS for Conserved Primer Pair Discovery (poster
session)
Joint
work with Weijia Xu, Wenguo
Liu, and C. Randal Linder.
MoBIoS,
a Molecular Biological Information System is a next generation
database management system focused on scalable retrieval and
mining of unorthodox biological data types that are poorly supported
by relational database systems. MoBIoS comprises built-in data
types for biological sequences and Mass Spectra. The MoBIoS
storage manager extends traditional database systems by including
built-in support for hierarchical clustering and nearest-neighbor
and range search in metric spaces. In addition to built-in metrics
to support sequence homology and protein identification, users
may add their own metrics.
We
report on the first biological application of MoBIoS; a comparative
study of the entire genomes of the plants rice and Arabidopsis
to determine conserved pairs of strings of DNA that could be
used to prime polymerase chain reactions (PCRs). Identification
of such set of paired conserved primers would allow amplification
of evolutionarily homologous DNA regions in a taxonomically
broad set of seed plants. The ability to amplify homologous
regions in a widely divergent set of species has a number of
applications, e.g., phylogenetic reconstruction and comparison
of protein evolution in a broad set of organisms. Ultimately,
this approach to identifying conserved primer pairs could provide
the community of systematists with a universal set of DNA sequences
that can be used for assembling the tree of life.

William
J. Murphy (SAIC-Frederick, Inc., Laboratory of Genomic
Diversity, National Cancer Institute Frederick, Maryland 21702)
murphywi@ncifcrf.gov
Reconstructing
the Genomic Architecture of Mammalian Ancestors Using Multispecies
Comparative Maps
Rapidly
developing comparative maps in selected mammal species are providing
an opportunity to reconstruct the genomic architecture of mammalian
ancestors and study rearrangements that transformed this ancestral
genome into existing mammalian genomes. Here we apply the recently
developed Multiple Genome Rearrangement algorithm (MGR) to human,
mouse, cat and cattle comparative maps (with 311-470 shared
markers) to impute the ancestral mammalian genome. Reconstructed
ancestors consist of 70-100 conserved segments shared across
the genomes that have been exchanged by rearrangement events
along the ordinal lineages leading to modern species genomes.
Genomic distances between species, dominated by inversions (reversals)
and translocations, are presented in a first multispecies attempt
using ordered mapping data to reconstruct the evolutionary exchanges
that preceded modern placental mammal genomes.
Joint work with Guillaume Bourque
(Centre de Recherches Mathématiques, Université de Montréal,
Montréal, Canada H3C 3J7), Glenn Tesler,
Pavel Pevzner (Department of Computer
Science and Engineering, University of California, San Diego
La Jolla, California 92093-0114), and Stephen
J. O'Brien (Laboratory of Genomic Diversity, National
Cancer Institute Frederick, Maryland 21702).

Luay
Nakhleh (Department of Computer Sciences, The University
of Texas at Austin) nakhleh@cs.utexas.edu
Reconstructing
Reticulate Evolution in Species (poster
session)
In
1997, Wayne Maddison made an important observation that led
to a separate analysis approach for phylogeny reconstruction.
In his seminal paper, Maddison observed that gene trees that
are related by reticulation can be combined into a network via
the computation of the minimum number of certain branch moves;
this number is called the SPR (for Subtree Prune and Regraft)
distance. The two main challenges for Maddison's approach are
(1)
computational: computing the SPR distance between two trees
is hard.
(2)
systematic: in practice, it is very hard to obtain the correct
gene trees.
In
this poster we present our solutions to these two challenges.
We address phylogenetic networks with constrained reticulation.
For such networks, and trees induced by them, we present an
efficient algorithm for measuring the SPR distance, as well
as reconstructing the network from the given trees. We address
the systematic challenge by considering a set of "good" gene
trees instead of a single gene tree. We present results from
extensive simulation studies that we conducted. Those results
show a significant improvement of our method over Maddison's,
as well as a clear outperformance over methods based on combined
analysis of datasets.
This
is a joint work with Tandy Warnow
and Randy Linder.

Nikolas
Nikolaidis (Institute of Molecular Evolutionary Genetics
and Department of Biology, Pennsylvania State University, University
Park, PA 16802, USA) nxn7@psu.edu
Evolution
of the Hsp70 Gene Superfamily in Two Sibling Species of Nematodes
Caenorhabditis elegans and C. briggsae (poster
session)
Joint
work with Masatoshi Nei.
The
Hsp70 gene superfamily of C. briggsae was characterized in an
attempt to investigate the evolutionary relationships with the
respective one of its sibling species C. elegans. The phylogenetic
analyses included also genes from Drosophila melanogaster and
Saccharomyces cerevisiae to clarify the long-term evolution
of hsp70s. The Hsp70s are classified into three monophyletic
groups according to their sub-cellular localization, namely,
cytoplasm (CYT), endoplasmic reticulum (ER) and mitochondria
(MT). The Hsp110 genes can be classified into the polyphyletic
CYT group and the monophyletic ER group. The two nematode species
encode two Hsp70 and two Hsp110 proteins localized in the ER
and their highly heat-inducible genes contain introns. The different
Hsp70 and Hsp110 groups appear to evolve following the model
of independent or divergent evolution. These models can also
explain the evolution of the ER and MT genes. On the other hand,
the CYT genes are divided into heat-inducible and constitutively
expressed genes. The constitutively expressed genes probably
have evolved by the birth-and-death process and the rates of
gene birth-and-death are different among all organisms studied.
The heat-inducible genes show an intra-species phylogenetic
clustering, suggesting sequence homogenization, probably by
gene conversion-like events. In addition, these genes show high
levels of sequence conservation in both intra- and inter-species
comparisons, and in most comparisons the amino acid sequence
similarity was higher than the nucleotide. These results suggest
that purifying selection also played a crucial role in sequence
conservation of the Hsp70s. Therefore, we suggest that the CYT
heat-inducible genes have apparently followed a mixed evolutionary
pattern with a combination of purifying selection, birth and
death, and gene conversion-like events.

Stephen
J. O'Brien
(Chief Laboratory of Genomic Diversity, National Cancer Institute-Frederick)
obrien@ncifcrf.gov
The
Landscape of Comparative Genomics in Mammals
Dense
genetic maps of human, mouse and rat genomes that are based
on coding genes, microsatellite and single nucleotide polymorphism
(SNP) markers have been complemented by precise gene homologue
alignment with moderate resolution maps of livestock, companion
animals and additional mammal species. Comparative genetic assessment
expands the utility of these maps in gene discovery, in functional
genomics, and in tracking the evolutionary forces that sculptured
the genome organization of modern mammalian species.
Ross
Overbeek (Fellowship for Interpretation of Genomes-FIG)
Ross@theFIG.info
Exploiting
Gene Clusters to Curate Annotations
Slides: html
pdf
ps
ppt
Previously,
we argued that gene clustering on prokaryotic genomes was the
key to locating "missing genes," and we demonstrated that the
technique worked remarkably well. The use of clusters is also
the key to straightening out many of the assignments that could
not be made precisely based only on similarities and motifs.
We will consider the case of gene clusters related to leucine
degradation as an example; they occur in phylogentically diverse
organisms, and many of the genes involved currently have inaccurate
or imprecise annotations. Comparative analysis of clusters,
as well as occurrence profiles, can be used to methodically
construct chains of assignments that follow from a few basic
observations. This sets the stage where a single carefully chosen
wet lab confirmation can confirm or reject a large number of
assignments, often removing ambiguities from tens if not hundreds
of genes.
Pavel
A. Pevzner (Department of Computer Science and Engineering,
University of California at San Diego) ppevzner@cs.ucsd.edu
http://www-cse.ucsd.edu/users/ppevzner/
Transforming
Men into Mice: Lessons from Human and Mouse Genomic Sequences
Despite
some differences in appearance and habits, men and mice are
genetically very similar. In a pioneering paper, Nadeau and
Taylor, 1984 estimated that surprisingly few genomic rearrangements
(about 200) have happened since the divergence of human and
mouse 75 million years ago.
The
genomic sequences of human and mouse provide evidence for a
larger number of rearrangements than previously thought and
shed some light on previously unknown features of mammalian
evolution. In particular, they provide evidence for extensive
re-use of breakpoints from the same relatively short regions
and reveals a great variability in the rate of micro-rearrangements
along the genome. Our analysis also implies the existence of
a large number of very short ``hidden'' synteny blocks that
were invisible in comparative mapping data and were ignored
in previous studies of chromosome evolution. These results suggest
a new model of chromosome evolution that postulates that breakpoints
are chosen from relatively short fragile regions that have much
higher propensity for rearrangements than the rest of the genome.
This
is a joint work with Glenn Tesler.
Ron
Y. Pinter (Department of Computer Science, Technion,
Israel Institute of Technology) pinter@csa.cs.technion.ac.il
Evaluating
a Class of Length-Sensitive Algorithms for Sorting by Reversal
Slides:
html
pdf
ps
ppt
Sorting
by reversal (SBR) has been used extensively in comparative genomic
studies [3]. Traditionally, bioinformaticians have been trying
to minimize the number of reversals and they evaluate results
by looking at the trace generated by the algorithm and asking
whether it makes biological sense. We have introduced a length
sensitive cost measure in an attempt to model the likelihood
of reversals based on their length. In this model the cost f(x)
of each reversal depends on the length, x, of the reversed sequence;
the overall cost of the SBR process is the total of the individual
reversals costs.
Initially
[4] we looked at f(x)=x, offering a QuickSort-like algorithm
which guarantees a provably good approximation of the minimal
SBR cost (finding the minimal cost is NP-hard). In response,
several biologists suggested we look at the family of functions
f(x)=x**alpha. We have developed a class of algorithms [1] that
find an approximate cost for any positive value of the exponent
alpha, but the question of which value of alpha is best is of
great interest.
We
decided to make this evaluation by using the cost of sorting
one genome to another as a distance between the genomes that
is fed to a tool that builds phylogenetic trees, and then compare
the results to evolutionary trees found using other methods.
This gives rise to numerous methodical and algorithmic issues,
such as:
- How many common genes are necessary to draw meaningful conclusions?
- How do we deal with duplicate genes?
- If the number of common genes for the whole dataset under
study is too low
- how do we put together partial results (i.e. combining trees
that were built on subsets of the sample) and how small can
the subsets be?
- Do we really need to rebuild the whole tree or can we accumulate
the scores of matches of the partial trees with the reference
tree?
- What similarity score between trees is appropriate for this
study?
- How do we cope with the fact that our algorithms produce only
approximate costs?
But the ultimate question is - how do we scan for the best value
of alpha?
The
poster will describe the method and the results on two datasets,
including the one from [2] which includes 15 genomes, and discuss
their merits.
References
[1]
Michael A. Bender, Dongdong Ge, Simai He, Haodong Hu, Ron Y.
Pinter, Steven Skiena, and Firas Swidan. Improved Bounds on
Sorting with Length-Weighted Reversals. To appear in the Proceedings
of the ACM-SIAM Symposium on Discrete Algorithms (SODA'04),
January 2004.
[2]
William Martin, Tamas Rujan, Erik Richly, Andrea Hansen, Sabine
Cornelsen, Thomas Lins, Dario Leister, Bettina Stoebe, Masami
Hasegawa,| and David Penny. Evolutionary analysis of Arabidopsis,
cyanobacterial, and chloroplast genomes reveals plastid phylogeny
and thousands of cyanobacterial genes in the nucleus. Proc.
Natl. Acad. Sci. USA. September 17, 2002; 99 (19): 12246^Ö12251.
[3]
Pavel A. Pevzner: "Computational Molecular Biology - an Algorithmic
Approach", MIT Press, 2000.
[4]
Ron Y. Pinter and Steven Skiena. Genomic Sorting with Length-Weighted
Reversals. Genome Informatics 13: 103-111 (2002).
Joint
work with Michael A. Bender*, Yaniv
Berliner**, Dongdong Ge*,
Simai He*, Haodong
Hu*, Michael Shmoish**,
Meir Shoham**, Steven
Skiena*, and Firas Swidan**.
*
Dept. of Computer Science, SUNY Stony Brook, NY 11794-4400.
** Dept. of Computer Science, Technion, Israel Institute of
Technology, Haifa 32000, Israel
Igor
V. Sharakhov (Center for Tropical Disease Research
and Training, University of Notre Dame, Notre Dame, IN 46556-0369,
USA) isharahk@nd.edu
High
Rates of Genome Rearrangements in Malaria Mosquitoes, Anopheles
gambiae and A. funestus
Slides: html
pdf
ps
ppt
The
rates of chromosomal evolution vary among different genomic
segments and eukaryotic lineages [1]. A comparative genomic
study between Drosophila melanogaster and Anopheles gambiae
shows extensive reshuffling of gene order within chromosomes
[2]. Genus Drosophila has a very high rate of paracentric inversions
[3]. Our study determines rates of chromosomal rearrangement
in genus Anopheles. Anopheles gambiae and A. funestus, important
vectors of malaria in tropical Africa, are in the same subgenus
and diverged about as recently as humans and chimpanzees (~5
million years ago) [4]. Using fluorescence in situ hybridization
(FISH), we mapped A. funestus cDNA clones on the five arms of
the polytene chromosome complement. Of 157 cDNAs used as probes,
116 mapped to single chromosomal locations on the A. funestus
cytogenetic map, and the remainder hybridized in multiple locations.
Those 116 cDNAs were mapped in silico to the completely sequenced
A. gambiae genome. The relative positions of sequences with
unique map locations in both species support the hypothesized
chromosome arm homologies and the reciprocal whole arm translocation
between 2L and 3R, postulated previously on the basis of relative
length and banding pattern [5]. Correspondence between chromosome
arms was contradicted by only two of the cDNAs examined in this
study. Within corresponding arms, paracentric inversions have
had a major impact on genome architecture since the divergence
of these species. Gene order has not been preserved along the
length of any chromosome arm, although there are conserved segments
in some regions near centromeres where the rate of meiotic recombination
may be reduced. Inversions have involved large as well as relatively
small chromosomal segments. One of three small inversions at
the distal end of 2R includes a rearrangement involving the
8C region in A. gambiae that contains the major Plasmodium-refractoriness
locus Pen1 [6]. What has been the extent of rearrangement of
gene order between these species? The number of inversion events
can be estimated by considering the mean length of conserved
segments, because this length decreases with each inversion
fixed since the divergence of A. gambiae and A. funestus from
a common ancestor. The method of Nadeau and Taylor [7] was applied
to estimate mean lengths of all conserved segments in the genome,
based on the nucleotide distance in A. gambiae between the outermost
markers that defined the segments observed in our sample. An
assumption of the method, that rearrangements fixed during evolution
are randomly distributed in the genome, seems unlikely given
the extraordinary concentration of polymorphic inversions on
2R in both lineages. Of eight polymorphic inversions described
in A. gambiae, seven occur on chromosome 2R [8]. Similarly,
11 of 15 polymorphic inversions found in A. funestus involve
2R [9]. Accordingly, we assessed each arm independently. The
estimated mean lengths of all conserved segments on each arm,
defined with respect to A. gambiae, were X, 2.0 ± 0.2 megabases
(Mb); 2R, 0.9 ± 0.2 Mb; 2L, 2.2 ± 0.4 Mb; 3R, 2.2 ± 1.0 Mb;
and 3L, 1.1 ± 0.4 Mb. In a slight departure from Nadeau and
Taylor [7], each rearrangement was assumed to be an inversion
requiring two disruption events. Therefore, n inversions result
in 2n + 1 conserved segments. The number of inversions on each
arm was 5 ± 1, 36 ± 9, 11 ± 3, 11 ± 3, and 19 ± 5, respectively.
Assuming a divergence time of 5 million years [4], the rate
of fixation per My for each chromosome arm can be estimated
as 0.5, 3.6, 1.1, 1.1, and 1.9, respectively (or 7 when estimated
across the genome). When normalized to account for differences
in chromosome length, the number of inversions per Mb per My
for X, 2R, 2L, 3R, and 3L was estimated as 0.023, 0.057, 0.022,
0.021, and 0.044, respectively (0.031 genome-wide). This rate
is even more extreme than the genome-wide estimate for Drosophila
[3]. Moreover, our results indicate that 2R has a higher rate
of rearrangement than other arms. It is clear that tightly linked
genes in A. gambiae are unlikely to be similarly linked in A.
funestus, particularly on 2R. The estimate of mean conserved
segment length derived for each arm can be used to predict the
probability of linkage in A. funestus, given the known distance
between genes in A. gambiae and the assumption of random distribution
of breakpoints [7]. As an example, the probability that genes
1 Mb apart on 2R in A. gambiae are linked on 2R in A. funestus
is only 0.31. Polymorphic inversions on chromosome 2R are widespread
within the A. gambiae and A. funestus and are believed to indicate
adaptations to different environmental niches [8, 9]. Identification
of genes encoded within these inversions could provide clues
to factors determining mosquito behavior and vectorial capacity.
Thus, the main features of genome rearrangements in malaria
mosquitoes, A. gambiae and A. funestus, can be summarized as
following: (1) the reciprocal whole arm translocation has preserved
a synteny (the occurrence of genes) at the whole-arm level;
(2) high rate of paracentric inversions, especially on 2R, have
had a major impact on extensive gene order reshuffling. Our
results suggest that the success of positional cloning or interspecific
microarray experiments may be limited to either very closely
related anopheline species or small genomic fragments. Further
comparative studies of these two genomes will provide valuable
insights into the mechanism and effects of chromosomal rearrangements.
This study was supported by grants from NIH (AI48842) to N.J.B.
and from the Indiana 21st Century Research & Technology Fund
to F.H.C.
References:
1.
E. Eichler, D. Sankoff, Science 301, 5634 (2003).
2. E. M. Zdobnov et al., Science 298, 149 (2002).
3. J. González, J. M. Ranz, A. Ruiz, Genetics 161, 1137 (2002)
4. I. V. Sharakhov et al., Science 298, 182 (2002).
5. I. V. Sharakhov, M. V. Sharakhova, C. M. Mbogo, L. L. Koekemoer,
G. Yan, Genetics 159, 211 (2001)
6. L. Zheng, et al., Science 276, 425 (1997)
7. J. H. Nadeau and B. A. Taylor, Proc. Natl. Acad. Sci. U.S.A.
81, 814 (1984)
8. M. Coluzzi, A. Sabatini, V. Petrarca, M. A. Di Deco, Trans.
R. Soc. Trop. Med. Hyg. 73, 483 (1979)
9. I. Dia, D. Boccolini, C. Antonio-Nkondjio, C. Costantini,
D. Fontenille, Parassitologia 42, 227 (2000)
Joint
work with Andrew C. Serazin (1),
Olga G. Grushko (1),
Ali Dana (1), Neil
Lobo (1), Maureen E.
Hillenmeyer (1), Richard
Westerman (2), Jeanne
Romero-Severson (3), Carlo
Costantini (4), N'Fale
Sagnon (4) Frank H.
Collins (1), Nora J.
Besansky (1)
(1)
Center for Tropical Disease Research and Training, University
of Notre Dame, Notre Dame, IN 46556-0369, USA.
(2) Horticulture Department, Purdue University, West Lafayette,
IN 47907-1159, USA.
(3) Department of Forestry and Natural Resources, Purdue
University, West Lafayette, IN 47907-1165, USA.
(4) Centre National de Recherche et de Formation sur le
Paludisme, Ouagadougou, Burkina Faso.
Amal
A. Shervington (Biological Sciences Department, Forensic
Sciences Department, University of Central Lancashire, Preston,
PR1 2HE. UK) aashervington@uclan.ac.uk
Induced
CYP1A1 Gene Expression in Lung Cancer Cell Lines (poster
session)
Joint
work with Kulthum Mohammed.
The
gene CYP1A1 (cytochrome P450, family A polypeptide 1), encodes
a member of the cytochrome P450 superfamily of enzymes. The
cytochrome P450 proteins are monooxygenases that catalyze numerous
reactions involved in drug metabolism and synthesis of cholesterol,
steroids and lipids. The enzyme is reported to be present predominantly
in extrahepatic tissues in humans and in experimental animals
(1). CYP1A1 is of toxicological importance because it catalyses
the bioactivation of polyaromatic hydrocarbon (PHA) constituents
e.g. Benzo[a]pyrene and other combustion products abundant in
tobacco smoke to mutagens and canrcinogens (2).
Several
studies of the oncogenic significance of CYP1A1 have found correlation
between inducibility of the enzyme and lung cancer susceptibility
in smokers (3). The expression and activity of CYP1A1 were examined
using either peripheral blood lymphocytes as surrogate for lung
cancer tissue (3) or lung biopsy specimens from human subjects.
CYP1A1 transcripts were detected in lung cancer tissue either
by reverse transcription polymerase chain reaction (RT-PCR)
or northern blot hybridization (4).
In
our laboratory we used four different lung cell lines: A549
Adenocarcinoma; H460 large cell carcinoma; COR-L23/5010 drug
resistance large cell carcinoma and CCD-32Lu normal lung cells
as a control. We measured the level of CYP1A1 transcript using
the LightCycler (quantitative PCR). mRNA extracted from 106
cells using mRNA capture kit (Roche) were used to generate cDNA
by Reverse Transcription System (Roche) with CYP1A1 primers
(designed using primer3 web site) and amplified by the LightCycler
using CYP1A1
The
size of the CYP1A1 amplicon expected were 166bp, which was expressed
at a highly induced level in the A549 Adenocarcinoma and to
a less extent in the H460 large cell carcinoma. Very faint bands
can be seen in L23/5010 drug resistance large cell carcinoma.
No CYP1A1 can be detected in the normal lung cells. An amplicon
of 300bp was amplified only in the control and not in the cancerous
cell lines. Further work is required to characterise the 300bp
band and to identify its significance.
Our
results have shown an induced level of CYP1A1 in the adenocarcinoma
cell line which is absent from the control, indicating that
CYP1A1 is expressed at elevated level in some cancer cell line
but not in the control.
Numerous
citation have emphasised on the induction level of CYP1A1 in
peripheral blood lymphocytes and lung cancer tissue but there
have been no or few reports on the level of CYP1A1 in established
cancer such as cancerous cell lines. Our study has shown elevated
level of CYP1A1 in some of the cancerous cell lines, which may
suggest an active role for the CYP1A1 in the maintenance of
cancer.
Jijun
Tang and Bernard M.E. Moret
(Department of Computer Science, University of New Mexico, Albuquerque,
NM 8713) jtang@trucha01.hpc.unm.edu
Large-scale
Phylogeny Reconstruction from Arbitrary Gene-order Data
pdf
ps
Slides:
pdf
Elisabeth
R.M. Tillier (Ontario Cancer Institute / University
of Toronto) e.tillier@utoronto.ca
http://www.uhnres.utoronto.ca/tillier/
Models
and Methods for Phylogenomics
I
will present a number of new approaches to some fundamental
problems in comparative genomics and sequence analysis:
1.
Using gene order information to confirm orthologous identifications.
2.
Using phylogenetic profiles for the phylogenetic analysis of
whole genomes.
3.
Development of substitution models for the analysis of protein
and RNA sequences.
Li-San
Wang (University of Pennsylvania) lisan@cs.utexas.edu
Distance-Based
Genome Rearrangement Phylogeny
Slides: html
pdf
ps
ppt
Evolution
operates on whole genomes through mutations that change the
order and strandedness of genes within the genomes. These events
are examples of ``rare genomic changes,'' which have low frequency
and high signal-to-noise ratio. Thus analyses of gene-order
data present new opportunities for discoveries about deep evolutionary
events, provided that sufficiently accurate methods can be developed
to reconstruct evolutionary trees.
In
this talk I will present our results in distance-based genome
rearrangement phylogeny reconstruction. We approach the problem
by developing new statistically-based true evolutionary distance
estimators. These estimators are based on the distributions
of genomic distances including breakpoint and inversion distances
under Markov Models. In our simulation study, we obtain highly
accurate trees by using these new distance estimators, even
when the amount of evolution in the dataset is high.
This
is joint work with Robert K. Jansen
and Tandy Warnow at the University
of Texas, and Bernard M.E. Moret at the University of New Mexico.
Derek
E. Wildman (Center for Molecular Medicine & Genetics,
Wayne St. University School of Medicine, Detroit, MI 48214)
dwildman@genetics.wayne.edu
An
Objective View of Humankind's Place in Primate Evolution
Joint
work with Monica Uddin, Guozhen
Liu, Lawrence I. Grossman,
and Morris Goodman.
In
order to accurately place humankind in a phylogenetic classification
of Primates it is necessary to know the phylogenetic relationships
among all members of the order. We present the phylogenetic
relationships and times of divergence for extant members of
the order as determined by DNA nucleotide sequence data, and
we focus particularly on the relationships within the family
Hominidae. Local molecular clock analyses using fossil calibrations
calculate that the time of origin for the order Primates as
a crown group is 63 million years ago. Anthropoid primates (New
World monkeys, Old World monkeys, and apes including humans)
originated approximately 40 million years ago.
Phylogenetic
and local molecular clock analyses from a sample of 97 genes
show that humans and chimpanzees form a clade that most recently
shared a common ancestor between 5 and 6 million years ago.
These coding DNA data separate the human-chimpanzee clade from
the gorilla clade between 6 and 7 million years ago. This African
ape clade separated from the orangutan clade between 13 and
15 million years ago. We calculated the percent nonsynonymous
DNA identity between humans and chimpanzees to be 99.4%, synonymous
identity to be 98.4%, and total DNA sequence identity to be
99.1%. Interestingly, phylogenetic analysis grouped humans and
chimpanzees together when only nonsynonymous sites were analyzed.
This result suggests that at the protein level humans and chimpanzees
are functionally more similar to each other than either taxon
is to any other ape. Additionally, of these 97 genes, 30 show
evidence of positive selection during the descent of catarrhine
primates. An equal number (n=14) of these genes show elevated
nonsynonymous rates of substitution on the human and chimpanzee
lineages.
Divergences
between humans and chimpanzees are placed in perspective by
comparing their date of divergence with those found across the
class Mammalia. The age of genus level crown groups for mammals
ranged from 2 to 21 million years old. The mean crown group
time of origin is approximately 8 million years ago, and the
95% confidence interval falls between 6.61 and 9.71 million
years ago. Thus, humans and chimpanzees more recently share
a common ancestor than do many congeneric groups of mammals.
Tiffani
L. Williams (Department of Computer Science, University
of New Mexico, Albuquerque, NM 87131) tlw@cs.unm.edu
Searching
for Optimal Trees Under Maximum Parsimony (poster
session) pdf
ps
Kenneth
H. Wolfe (Department of Genetics, University of Dublin,
Trinity College) khwolfe@tcd.ie
http://www.gen.tcd.ie/khwolfe/
Genome
Evolution and Sorting Out Ancient Polyploidy in Yeasts
Yeasts
are a good model system for investigating gene order and chromosomal
evolution because their genomes are compact and relatively eas
ing in a metabolic pathway was put together during the evolution
of species that can grow vigorously without oxygen.
Stacia
K. Wyman (Department of Computer Sciences, University
of Texas at Austin) stacia@cs.utexas.edu
http://www.cs.utexas.edu/users/stacia
Comparative
Chloroplast Genomics of Seed Plants: Annotation and Analysis
of Genomic Sequences ( |