The
proposed program is devoted to the application of probability
and statistics to problems in three areas: the genome sciences,
networks and financial engineering. These application areas
are all associated with complex systems, and strategies for
system analysis will serve as an organizing principle for the
program. (By complex systems we mean systems with a very
large number of interacting parts such that the interactions
are nonlinear in the sense that we cannot predict the behavior
of the system simply by understanding the behavior of the component
parts.) Furthermore, these areas share the common feature that
they are systems for which a huge amount of data is available.Mathematical
models developed for these systems must be informed by this
data, if they are to provide a basis for scientific understanding
of the systems and for critical decision-making about them.
The mathematical and statistical foundations of this program
will include stochastic modeling and simulation, statistics,
and massive data set analysis, as well as dynamical systems,
network and graph theory, optimization, control, design of computer
and physical experiments, and statistical visualization. The
program will be particularly appropriate for probability/statistics
postdocs and long-term participants with some background in
at least one of the three major areas of application and an
interest in developing the integration tools that will provide
them with an entrée into modeling/data integration
issues in the other areas. There will be extensive tutorials
in the application areas.
The
health of human populations and biosystems, information networks,
and financial systems, is fundamental to the success of modern
civilization. A map of the human genome is nearly in hand,
but we are just beginning to understand how to harness it. (For
example, how does the one-dimensional information encoded in
DNA lead to the immensely complicated three-dimensional structure
of proteins, the protein folding problem?) Trying to understand
the function of a single gene leads to a complicated set of
analyses that requires the integration of stochastic and biological
models with noisy, high dimensional data coming from multiple
sources. Integration of this information for all of the genes,
through comparative and evolutionary genomics, is critical in
determining the role of gene related diseases in the human population,
and in combating these diseases. A federation of 6000 autonomous
networks, called the Internet, and wireless communications are
on their way to providing anywhere, anytime multimedia communication. Interconnected
power networks linking independent power producers with consumers
in an environment with diminishing regulation raise many new
questions regarding both the physical and economic operation
of the electric power system. In few of these systems is there
centralized control. How can we ensure that they work properly? In
finance, the dynamics of the sequence of events triggered by
the default of Russian government bonds in August 1998, demonstrates
that the global financial system is an extraordinarily complex
network of relations involving broker/dealers, banks, institutional
investors, and other counterparties. The global volatility triggered
by the default is a wakeup call to society on the importance
of a deeper understanding and control of financial systems.
In all these areas, issues such as network topology, the "degree
of connectedness", computational complexity, and the probability
of systemic failure are relevant, as is the capacity to sample
a system and store large amounts of data. Furthermore,
system constraints create complex dependencies amongst elements
of the sampled data. For example coordinated gene expression
causes DNA chip measurements to exhibit strong positive dependence
amongst genes in common biochemical pathways, communication
traffic with many sources and destinations shares common bottleneck
links, and serial dependence is clearly present in financial
time-series data.
The
mathematical sciences, and particularly probabilistic and statistical
methods, are key to understanding the dependencies of these
systems. Interacting stochastic systems and cellular automata,
as well as dynamical systems and partial differential equations
are examples of mathematical structures directed at understanding
how one part of a system influences other parts and how those
influences propagate. Historically, limited computational power
restricted the size and complexity of the systems that could
be usefully modeled and, in many settings, limited data made
it difficult or impossible to evaluate the appropriateness and
accuracy of proposed models. An explosion in computational power
and other technical advances that support collection of large
amounts of data have radically altered this situation. Simultaneous
with and in part because of these technological advances, new
areas of application have emerged that require the understanding
of systems whose size and complexity tests the limits of even
the most recent computational, mathematical, and statistical
methodologies.
Understanding
in the diverse areas of genomics, communication networks, and
financial engineering will benefit from the broad view which
we propose to adopt in the one year IMA program -- a view based
on the development and analysis of stochastic models and the
implementation of the essential statistical analysis using advanced
computational methods. To avoid stochastic modeling is to proceed
at some peril. Pieces of a system might be considered
in isolation using rudimentary data analysis, but this isolated
approach may not provide the most efficient analysis and, more
importantly, may not allow certain critical questions to be
addressed. It is a basic premise of stochastic modeling
that data are viewed as the realization of a stochastic process.
An appropriate modeling framework allows inferences about unknown
system elements or decisions about how to manage the system
to be expressed in terms of the stochastic process. Indeed,
if we can work out properties of the underlying stochastic process,
we may come to a better understanding of the entire system.
To do so requires sophisticated mathematical techniques.
Computational algorithms are crucial for implementing the calculations
suggested by the stochastic models. Advanced computer systems
not only allow us to collect more data, but they also allow
us to run much more sophisticated analyses than have been possible
previously. Furthermore, statistical methods provide us
not only with estimates or predictions of unknown quantities,
but also with precise statements about our corresponding uncertainty.
Participating
postdocs will require and will acquire skills in probability
and the mathematical analysis of stochastic models, in the development
of appropriate stochastic models by careful study of subject
matter, in statistical inference, and in computational methods
such as optimization and Monte Carlo.
Fall
Quarter (September-December 2003)
Mathematical
& Statistical Problems in Genome Sciences
A
working draft of the human genome was made publicly available
in summer 2000, with a final sequence, erring less often than
once per 10,000 bases, to follow within two years. More than
twenty microbial genomes are already complete, and the sequencing
of both plant and animal model organisms is well underway. Coupled
with the availability of sequence data are technologies that
enable us to measure the simultaneous gene expression pattern
in a cell. Obtaining such a mass of data will mark the beginning
of a period of exceptional knowledge discovery in biology. Eric
Lander of the Whitehead Institute has likened the effect on
biology of these new resources to the effect on chemistry of
the periodic table. Having a global view, knowing all the genes,
their function, their common alleles, and the biochemical pathways
in which they participate will have a profound effect on science
and medicine. Mathematics and statistics have the potential
to have a larger impact in the processing and analysis of genome
data than the clearly substantial effect they have had in the
fields of molecular biology and genetics to date. (See Calculating
the Secrets of Life: Contributions of the Mathematical Sciences
to Molecular Biology, Eric S. Lander and Michael S. Waterman,
Editors; Committee on the Mathematical Sciences in Genome and
Protein Structure Research, National Research Council, 1995.)
The function of most genes is unknown, and stochastic modeling
may improve the way inference from expression profiles works
in the area of functional genomics. Stochastic models have long
been used in evolutionary modeling, and new models and computational
methods are needed to cope with whole genome comparisons in
comparative genomics. Statistical methods are being refined
in genetic mapping studies that now can in principle consider
hundreds of thousands of markers in an attempt to find genes
affecting complex diseases. The problem of inferring protein
structure is long-standing and continues to demand the most
sophisticated mathematical, statistical, and computational approaches.
There are undoubtedly many more mathematical and statistical
problems that will arise from the genome sciences.
The
purpose of the term is to uncover emerging problems in computational
molecular biology. CMB has a long history that includes techniques
such as sequence alignment, sequencing, physical mapping, and
so on. Each of these has a well-developed set of methods for
its analysis (such as BLAST). Here we address the next generation
of problems. Two recent examples should serve to illustrate
the possibilities: SNP (single nucleotide polymorphism) detection,
and DNA microarrays. SNPs are locations in DNA at which individuals
vary greatly. There are now several molecular technologies for
high throughput SNP detection. SNPs are used as markers
for disease gene mapping, and currently play a central role
in drug design in pharmacogenomics. Analyzing these data
has provided a number of challenging statistical problems, in
part because SNPs are not usually a random survey of molecular
variation in the genome. DNA microarrays provide a way to study
the relative expression levels of proteins in different biological
backgrounds (e.g. cell cycle data, tumor presence/absence).
Parallel assays of expression levels for many thousands of genes
simultaneously results in high-dimensional, noisy data. Problems
involving image analysis, clustering and modeling expression
profiles are central to many varied and important uses of arrays
in human genetics and molecular biology.
We
have outlined four possible workshops below. By the nature of
the technologies involved, there are a number of overlapping
topics. As in any other emerging science, the problems that
are now of interest will probably be replaced by others by 2003,
so these topics should be treated as illustrative.
Opening
Tutorials/Kickoff
The
program will open with a one-week tutorial on "Tools for
Model and Data Integration in the Genome Sciences" (including
a Statistics Tutorial "Refresher in S-Plus" that
will provide useful background for the entire year program),
followed by a brief minisymposium on "Information integration
technologies for complex systems." The tutorial will be
aimed at the postdocs, and others with a probability/statistics
background. The purpose of the tutorial is 1) to prepare the
IMA postdocs and other IMA participants for the Genomics program,
2) to provide graduate students and faculty from universities
everywhere (particularly the IMA participating Institutions)
with an entrée into modeling/data integration problems
in the Genome Sciences and 3) to publicize to the wider community
the importance and intellectual excitement involved in the understanding
of these complex systems.
Mathematical Topics in the Genomics Program:
Exploratory multivariate analysis (e.g. clustering; model based
methods), log-linear models, hidden Markov models, Markov chain
Monte Carlo, graphical models for networks, likelihoods and
optimization. Image analysis. Branching process models of cell
growth and replication. Stochastic models on trees. Inference
for dependent data generated by networks.
Winter
Quarter (January - March 2004)
Communication
Networks
The
Internet and other communication networks are growing and changing
in such a way that they present a rapidly moving target for
modeling and data collection and analysis. The problems
associated with designing, engineering, and managing such rapidly
moving and constantly evolving systems have shaped much of networking
research in the past and are likely to play an even more important
role in the future as the problems acquire a central element
of scale, extending well beyond what has previously been considered.For
example, there has been a great deal of interest and progress
in the past decade in measuring, modeling and understanding
the properties and performance implications of actual traffic
flows as they traverse individual links or routers within the
network. However, with the imminent deployment of novel scalable
network measurement infrastructures and newly-designed large-scale
network simulators, we will have access to a new generation
of data sets of highest-quality network measurements that are
of unprecedented volume, are simultaneously collected from a
very large number of points within the network, and have an
extraordinary high semantic context.This transition from the
traditional single link/router-centered view to a more global
or network-wide perspective will have profound implications
for trying to describe and understand the dynamic nature of
large-scale, complex internetworks such as the global Internet,
where the interesting problems are those of interactions, correlations,
and heterogeneities in time, space, and across the different
networking layers.While these next-generation data sets can
be fully expected to continue to reveal tantalizing variability,
intriguing fluctuations, and unexpected behaviors, they will
also raise many new data analysis and modeling issues and challenge
the use of established and well-understood techniques.In particular,
the problems of explaining why and how some of the observed
phenomena occur, of predicting the stability and performance
of truly large-scale networks under alternative future scenarios,
and of recommending long-term control strategies are certain
to generate new research activities in the mathematical and
physical sciences and will remain with us for the foreseeable
future. Of course, by 2003, the important questions may look
very different from the important questions today, but the characteristics
of complex models and massive amounts of data will almost certainly
remain for the foreseeable future.
The
winter program will open with a short course and tutorial on
"The Internet for Mathematicians"   and   "Measurement,
Modeling and Analysis of the Internet." The purpose
of the short course and tutorial is 1) to prepare the IMA postdocs
and other IMA participants for the Communication Networks program,
2) to provide graduate students and faculty from universities
everywhere (particularly the IMA participating Institutions)
with an entrée into modeling/data integration problems
in Communications Networks and 3) to publicize to the wider
community the importance and intellectual excitement involved
in the understanding of these complex systems.
This
component of the program is concerned with advanced mathematical,
statistical and computational methods in finance and econometrics.
Finance has been profoundly influenced by relatively new ideas
on how to measure the risk and return of investments. Real progress
has come by combining new concepts in finance and risk-management
with advanced mathematical modeling and an exponential increase
in computing power. A more recent development has been the lowering
of the cost of acquiring data and information via the Internet.
Improved data access allows modelers to implement sophisticated
systems, which can be used to make real-time decisions in terms
of investing, managing risk, or allocating capital.
Mathematical
Challenges--Inverse Problems in Asset Pricing Theory.
Financial Economics has a very
elegant way of characterizing a system of prices which is consistent
with no-arbitrage (no free lunch): namely, the existence of
a probability measure on future market scenarios such that any
contingent claim can be priced as the expected value of its
future cash-flows. This result is due to K. Arrow and
G. Debreu. In modern finance, the Arrow-Debreu paradigm is used
for pricing and hedging instruments that share the same underlying
risks. These include, most prominently, derivative securities.
Derivatives always exist in a universe in which the underlying
asset or assets are present. The power of the Arrow-Debreu
measure is that it allows us (i) to price derivatives in relation
to the underlying security and (ii) to make sure that these
prices are not subject to arbitrages, i.e. that we are not systematically
losing money by trading at certain levels. The other remarkable
feature of the Arrow-Debreu measures is that they form an "interpolation"
between the prices of liquidly traded assets and less liquid
assets for which price discovery is more difficult.
In particular, the perturbation of the probability that characterizes
the equilibrium gives useful information about the market risk
of trading positions. The problem of interest is:
Construct
Arrow-Debreu probabilities that are consistent with concrete
market situations involving several traded assets and multiple
trading dates.
So
far, only small-dimensional systems have been implemented.
It is only in the last few years that we have enough computing
power and theoretical understanding to begin to implement large-scale
systems. The development of mathematical and computational tools
to solve this problem is very important since it is at the crossroads
between Asset-Pricing Theory and financial applications of the
theory. Its solution should drive the development of new
modeling, statistical and computational methodology, and since
similar inverse problems arise in other areas, methods developed
here should find broader application.Development of new methods
in the context of finance has the added benefit of ensuring
that their validity will be thoroughly tested through implementation
in the markets.
In
the simplest models, prices are given as expectations of functions
of a diffusion process.The problem then becomes to find a diffusion
process satisfying several moment-type constraints. For
example, one may be given m option prices and the characteristics
of these contracts. The goal is to find a diffusion measure
that is consistent with the observed prices (in the Arrow-Debreu
sense). This is a mathematically ill posed problem that
is isomorphic to finding a probability measure from a few of
its moments. Either no solution exists or there are many
possible solutions. Continuous dependence on the data
can be problematic.
Since
the early 1990's several solutions have been suggested.
Some are parametric in nature and exploit the structure of the
equation in clever ways. Unfortunately, these approaches
are restricted for the most part to model problems in one dimension
and to parametric families of distributions that are not suitable
for realistic problems. In reality, the models
used by large broker-dealers in financial derivatives make use
of multiple risk factors, so we are dealing with multidimensional
diffusions and with complex constraints. The question
then becomes:
Design
stable numerical algorithms for selecting and calibrating financial
models to market data that can be applied in the presence of
multiple risk factors and many market constraints.
Scientific
Interest. The scientific issues that arise in inverse
problems in finance are not merely algorithmic. They touch
upon the foundations of the field of financial economics and
serve to validate or to invalidate ideas that remain untested
in the markets. Here there is a big difference from physics
and engineering. Whereas in the latter it is possible
to repeat experiments under similar conditions and the models
are basically mathematizations of physical laws, we know that
no experiment in finance or economics can be reproduced exactly
as in the past. We do not even know the relevant state
variables, and consequently, the modeling of pricing probabilities
and the selection problem become much more challenging and important
than in most inverse problems in physics.
Mathematical Challenges--Monte
Carlo Simulation: Asset Pricing, Risk-Management and Asset
Allocation in High-dimensional Systems.This
second area of problems has been exploding for several years,
and recently, there have been several very important developments.
Longstaff-Schwartz-Carriere
algorithm for solving free-boundary problems in MC simulation.
This breakthrough was long-awaited by practicitioners. Monte
Carlo (MC) simulation is designed for linear problems (evaluation
of high-dimensional integrals). The use of MC for American-style
options requires a new idea, known as Least Squares Monte Carlo,
which essentially performs dynamic programming on a set of non-recombining
paths with high accuracy. The mathematical theory and
the study of the bias in these clever algorithms is going full
speed ahead, since the original paper of Longstaff and Schwartz
(1998) came out. The main issue is:
Develop
a coherent analysis of Least Squares Monte Carlo algorithms
for American options in high dimensional economics. Develop
a theory for understanding numerical errors and statistical
biases that arise from dynamic estimation of conditional expectations,
early-exercise dates, etc.
Large-Scale
Dynamic Asset Allocation Models. Since the intertemporal
CAPM of Merton and Sharpe, people have been trying to apply
dynamic programming ideas to solve allocation problems under
different investment horizons and budget constraints.
This theory seems to be OK but there are several elements that
seem to indicate that there will be much more activity here.
First, the academic papers assume that there are only one or
two assets, that strategies are self-financing and that utilities
are homogeneous. All these assumptions are highly unrealistic.
Despite the fact that the papers have been written and the (Nobel)
prizes handed out, we expect that computers will finally allow
us to actually run investments strategies which are diversified
among dozens of assets with reasonable, complex scenarios and
intertemporal reallocation according to real-life events.The
goal is:
Develop
platforms for large-scale asset allocation models (~20
to 50 variables) that produce verifiable results. Include
in these models the possibility of decision making by investors
and state-contingent optimization. Include non-self financing
portfolios.
The
spring program will open with a one-week tutorial.
The purpose of the tutorial is 1) to prepare the IMA postdocs
and other IMA participants for the Financial Engineering program,
2) to provide graduate students and faculty from universities
everywhere (particularly the IMA participating Institutions) with
an entrée into modeling and data integration problems in
Financial Engineering and 3) to publicize to the wider community
the importance and intellectual excitement involved in the understanding
of these complex systems.
The
following scientists are confirmed or highly likely as long-term
visitors during the program. Other long-term visitors are currently
being arranged.
Name
Department
Affiliation
Period of Visit
Scot Adams
IMA
University of Minnesota
9/1/02 - 7/1/04
Soohan Ahn
Department of Statistics
Seoul National University (SRCCS)
9/8/03 - 2/19/04
Montaz Ali
School of Computational And Applied Mathematics
Witwatersrand University
11/1/02 - 10/24/03
Yusuf Bilgin Altundas
Schlumberger-Doll Research
9/3/02 - 12/22/03
Greg Anderson
School of Mathematics
University of Minnesota
9/1/03 - 6/30/04
Douglas N. Arnold
IMA
University of Minnesota
7/15/01 - 8/31/05
Donald G. Aronson
IMA
University of Minnesota
9/1/02 - 8/31/05
Gerard Awanou
IMA
University of Minnesota
9/2/03 - 8/31/05
Hee-Jeong Baek
Department of Mathematics
Seoul National University (BK 21 Math-SNU)
3/13/04 - 6/30/04
Karen Ball
University of Minnesota
9/2/03 - 6/25/04
Antar Bandyopadhyay
University of Minnesota
9/3/03 - 8/31/04
Peter Bank
Department of Mathematics
Humboldt University of Berlin
3/28/04 - 5/7/04
Maury Bramson
Dept. of Math
University of Minnesota
9/1/03 - 6/30/04
Olga Brezhneva
University of Minnesota
9/3/02 - 8/16/04
Rene Carmona
Operations Research & Financial Engineering
Princeton University
4/13/04 - 6/30/04
Laura Chihara
Department of Mathematics and Computer Science
Carleton College
9/1/03 - 12/31/03
Hi Jun Choe
Department of Mathematics
Yonsei University
12/29/03 - 2/28/04
Wanyang Dai
Department of Mathematics
Nanjing University
1/4/04 - 3/8/04
Josep Elgueta
Departament Matematica Aplicada II
Universitat Politecnica de Catalunya
6/5/04 - 6/20/04
Hans Foellmer
Institut fur Mathematik
Humboldt Universitat zu Berlin
4/10/04 - 7/10/04
Shmuel Friedland
Department of Mathematics, Statistics, and Computer Science
University of Illinois - Chicago
9/3/03 - 6/30/04
Tim Garoni
IMA
University of Minnesota
8/25/03 - 8/31/05
Balaji Gopalakrishnan
University of Minnesota
9/3/02 - 12/19/03
Anne Gundel
Institu fuer Mathematik
Humboldt University Berlin
4/11/04 - 5/13/04
Chuan-Hsiang Han
IMA
University of Minnesota
9/2/03 - 8/31/05
Mark Handcock
Department of Statistics
University of Washington
11/3/03 - 11/24/03
David C. Heath
Department of Mathematical Sciences
Carnegie Mellon University
4/10/04 - 5/29/04
Ulrich Horst
Institut für Mathematik
Humboldt Universität zu Berlin
4/14/04 - 6/27/04
Fern Hunt
Mathematical and Computational Sciences Division
National Institute of Standards and Technology
9/14/03 - 9/30/03
David R. Hunter
Department of Statistics
Pennsylvania State University
11/1/03 - 11/22/03
Naresh Jain
School of Mathematics
University of Minnesota
9/1/03 - 6/30/04
Lili Ju
University of Minnesota
9/3/02 - 8/8/04
Christina Kendziorski
Department of Biostatistics and Medical Informatic