Talk abstract:
Algorithms for Detection and Analysis
of Tandem Repeats in DNA Sequences
Gary Benson
Department of Biomathematical Sciences
Mount Sinai School of Medicine
New York, NY 10029-6574
212-241-5777 Work
212-860-4630 FAX
benson@ecology.biomath.mssm.edu
The ultimate goal of the human genome project is to understand
the functioning of living organisms at the molecular, cellular
and higher levels. Such understanding holds enormous promise
for early detection and treatment of disease. The first step
in the genome project has been to sequence the DNA of a variety
of organisms, thereby generating an immense quantity of data.
Discovering the function of this DNA will depend in large part
on computational and mathematical analysis. A very informative
type of analysis is the search for repetitive patterns in DNA.
DNA is subject to a variety of mutational mechanisms, some
of which have the effect of copying part of the DNA from one
location in the sequence into another location. Over time, these
originally identical copies diverge because of additional mutations.
Evolution which is ever opportunistic, has used these duplication
and mutation events to create families of duplicated genes,
create modified genes, create new genes and extend and adapt
regulatory control structures. Recognizing these duplicated
pieces has in many cases simplified the functional analysis
of DNA.
One of the less well understood mutational mechanisms is tandem
duplication. In this process, a stretch of nucleotides is duplicated
to produce two or more adjacent copies, resulting in a tandem
repeat. Over time, the copies undergo additional mutations so
that typically, multiple approximate tandem copies are present.
Tandem repeats occur frequently in the human genome, including
the centromeres and telomeres which are important chromosomal
components. They have been shown to cause inherited human diseases,
may play a variety of regulatory and evolutionary roles, and
because of their polymorphic character, are important laboratory
tools for linkage analysis and DNA fingerprinting. In this talk
I will discuss an efficient algorithm for detecting tandem repeats
in genomic sequence data. Detection is based on k-tuple matching
and a collection of statistical filtering criteria.
An interesting feature of tandem repeats is that the duplicated
copies are preserved together, making it possible to do "phylogenetic
analysis" on a single sequence. This involves using the
pattern of mutations among the copies to determine a minimal
or a most likely history for the repeat. A history tries to
describe the interwoven pattern of duplication and mutation
events in such a way as to minimize the number of identical
mutations that arise independently. In this talk I will also
describe approaches to algorithmic reconstruction of a tandem
repeat history.
Back to IMA "HOT
TOPICS" Workshop: Challenges and Opportunities in Genomics:
Production, Storage, Mining and Use
1998-1999
Mathematics in Biology
"Hot
Topics" Workshops