|
Probability
and Statistics in Complex Systems: Genomics, Networks, and Financial
Engineering, September 1, 2003 - June 30, 2004
Abstracts:
IMA
Workshop:
January
12-16, 2004
Group
Photos Material
from Talks

François
Baccelli (INRIA-ENS, Ecole Normale Superieure)
Francois.Baccelli@ens.fr
http://www.di.ens.fr/~baccelli/
Mean-Field
Interaction Models for Large TCP Networks
Slides:
pdf
This
presentation will review various dynamical interaction models
allowing one to analyze the throughputs obtained by a large
collection of TCP controlled flows sharing many links and
routers, from the sole knowledge of the network parameters
(capacity, buffer sizes, topology) and of the characteristics
of each flow (RTT, route, etc.).
In
the droptail case, the mean-field limit can be described geometrically
as a billiards in the Euclidean space. This billiards has
as many dimensions as the number of flow classes and as many
reflection facets as there are routers and links. This allows
one to determine the possible stationary behaviors of the
interacting flows and provides new ways of assessing TCP's
fairness.
The
RED case can also be investigated by such mean-field techniques.
In the single link case, this allows one to determine in closed
form the stationary distribution of the stationary throughputs
obtained by the flows.
When
aggregated, the traffic generated by these models exhibits
TCP and network-induced fluctuations that will be compared
to statistical properties observed on real traces.

Andre
Broido (CAIDA) broido@caida.org
Applications of network spectroscopy
Interpretation of spectra is practiced since
Noah's time. Quantum mechanics, chemistry, geophysics and
communication owe it many of their successes. Spectroscopy
paradigm is now becoming popular among Internet scientists.
Properties of sources, links and switches (e.g. Layer 2 techologies,
link bitrates, paths, OSes) can be inferred from periods and
quantizations of packet timings and header fields. Network
conditions from traffic congestion to DDoS attacks are studied
with these and related techniques of spectral analysis.
In this talk, we give an overview of results obtained by other
groups working in the field. We then describe two applications
that identify network properties via delay spectra. In one
case, we prove that spurious DNS updates at root and blackhole
servers come from the Microsoft Win2000 DNS implementation.
The method is based on a binary autocorrelation of interarrival
times. In another case, we apply spectroscopy to the bitrate
estimation of an end-user access in DSL and cable infrastructures.
We use entropy minimization for Radon transformed marginals
of the delay distributions and packet arrival times and find
delay quanta that identify provisioned bitrates.
This is a joint work with Evi Nemeth,
Ryan King, and K.C. Claffy

Mark Coates (Department of Electrical and Computer
Engineering, McGill University) coates@tsp.ece.mcgill.ca
Deterministic Packet Marking for Congestion Price Estimation
(poster session)
Several recent price-based congestion control schemes require
relatively accurate path price estimates for successful operation.
The proposed addition of the two-bit Explicit Congestion Notification
(ECN) field in the IP header provides routers with a mechanism
for conveying price information. Recently, two proposals have
emerged for probabilistic packet marking at the routers; the
proposals allow receivers to estimate path price from the
fraction of marked packets. In this poster we introduce an
alternative deterministic marking scheme for encoding path
price. Under our approach, each router quantizes the price
of its outgoing link to a fixed number of bits. We then make
use of the IP identification (IPid) field to map data packets
to different probe types, and each probe type calculates a
partial sum of the path price bits. A router deduces its marking
behaviour according to the IPid and the TTL (Time To Live)
field of each packet. We evaluate the performance of our algorithm
in terms of its error in representing the end-to-end price,
and compare it to probabilistic marking. We show that based
on empirical Internet traffic characteristics, our algorithm
performs better when estimating path price using small blocks
of packets.

Mark
Crovella (Department of Computer Science, Boston
University; currently at Laboratoire d'Informatique Paris VI
(LIP6)) crovella@cs.bu.edu
http://www.cs.bu.edu/faculty/crovella
Structural
Analysis of Network Traffic Flows
Slides:
pdf
Network
traffic arises from the superposition of Origin-Destination
(OD) flows. Hence, a thorough understanding of OD flows is essential
for modeling network traffic, and for addressing a wide variety
of problems including traffic engineering, traffic matrix estimation,
capacity planning, forecasting and anomaly detection. However,
to date, OD flows have not been closely studied, and there is
very little known about their properties. In this talk, I will
present the first analysis of complete sets of OD flow timeseries,
taken from two different backbone networks (Abilene and Sprint-Europe).
Using Principal Component Analysis (PCA), we have found that
the set of OD flows has small intrinsic dimension. In fact,
even in a network with over a hundred OD flows, these flows
can be accurately captured in time using a small number (10
or less) of independent components or dimensions.
I
will then show how to use PCA to systematically decompose the
structure of OD flow timeseries into three main constituents:
common periodic trends, short-lived bursts, and noise. Such
a decomposition provides insight into how the various constituents
contribute to the overall structure of OD flows. Finally, I
will explore the extent to which this decomposition varies over
time.
This
is a joint work with Anukool Lakhina,
Konstantina Papagiannaki, Christophe
Diot, Eric D. Kolaczyk and
Nina Taft.

Wanyang Dai (Department of Mathematics, Nanjing
University) nan5lu8@netra.nju.edu.cn
Networks with Multicast Services and Multitype Input Sources
(poster session)
Multicast and differentiated services are among the most important
communication operations and are highly demanded in integrated
services packet networks like Internet. When multicast operations
are involved, network internal traffic sources generated by
multiple copies of a packet in a switch (or a router) and multi-routing
requirements of the copies will largely increase the complexity
of network performance modeling, analysis, control and scheduling.
How to suitably model multicast operations will be a key step
in resolving all of the related problems. A way will be presented
to address this issue for a network involving both multitype
external input sources and shared-memory multicast switches.
Issues in general settings and research topics along this direction
will be raised.

Nicholas
G. Duffield (AT&T Labs-Research) duffield@research.att.com
http://www.research.att.com/info/duffield
Challenges
for Using Sampled Traffic Measurements
Slides: pdf
Traffic measurements are increasingly sampled due to ever growing
line rates and concomitant traffic volumes. On the other hand,
measurement-based applications increasingly depend on fine grained
traffic characterization. Can these applications work effectively
with existing sampled measurements? And if not, can we better
match sampling techniques to applications? This talk describes
the challenges and limitations for using sampled traffic measurements,
and some recent approaches that move beyond traffic sampling
methods in predominant use today.

Sally
Floyd
(ICSI) floyd@icir.org
http://www.icir.org/floyd/
Internet
Research Needs a Critical Perspective Towards Models
Slides:
html
pdf
ps
ppt
The
underlying modeling assumptions used in analysis, simulations,
and experiments can have a huge effect on the results of Internet
research. However, as a community we often do not understand
how the underlying modeling assumptions affect results for the
specific research issues under investigation. Further, in a
diverse and fast-changing Internet, we often don't understand
which modeling assumptions are relevant for the current Internet,
much less for the future Internet in which our newly-proposed
protocols will find themselves. This talk does not provide answers,
but argues for a more critical evaluation of network models
as a key issue in network research.

Mor Harchol-Balter (School of
Computer Science, Carnegie Mellon University) harchol@cs.cmu.edu
Improving Response Times for Static and Dynamic Web Requests
(poster session)
We consider how to improve client response times at web sites
for both static and dynamic web requests.
For the case of static "Get FILE" requests, we
propose a simple solution: implementing Shortest-Remaining-Processing-Time
(SRPT) scheduling in the Linux kernel. We demonstrate SRPT scheduling
improves the expected response time of *every* request [TOCS03,
USITS99]. The result is based on analytical work [SIGMETRICS01,
SIGMETRICS03] which proves that, counter-to-intuition, helping
short requests does not necessarily result in hurting long requests.
We also show that SRPT scheduling is particularly effective
at combatting transient overload [ITC03].
For dynamic requests (e.g., e-Commerce, online
shopping) response times can be much higher (tens of seconds),
due to competition between requests in the backend database.
Here we implement preemptive priority scheduling of the database
lock queues, providing very low response times for high-priority
customers (big spenders), while hardly penalizing low-priority
customers [ICDE04, SIGMOD04-in-submission].
FOR REFERENCES SEE: www.cs.cmu.edu/~harchol/.

Kevin Jeffay (Departments of
Computer Science & Statistics, University of North Carolina
at Chapel Hill) jeffay@cs.unc.edu
A Non-Parametric Approach to Generation and Validation of Synthetic
Network Traffic (poster session)
In order to perform valid experiments, network simulators and
testbeds require source-level traffic generators that correspond
to valid, contemporary models of application or user behavior.
Unfortunately, the Internet is evolving far more rapidly than
our present ability to understand and model the mix and use of
applications that account for the majority of bytes transferred
on the Internet. We describe a method that automatically generates
a source-level model of the mix of TCP applications found on a
network link, and show that this model is suitable for generating
synthetic traffic in a simulation that is statistically equivalent
to traffic in the measured network. The method is novel in that
it constructs the model in a non-parametric manner, without any
knowledge of the applications present in the network. We validate
the method empirically uses traces from our institution and Abilene
backbones. The importance of having a method that allows one to
faithfully reproduce the source-level characteristics of actual
network flows in experiments is illustrated via a case study of
active queue management mechanisms (AQM). We demonstrate that
dramatically different, and contradictory, conclusions are reached
regarding the effectiveness of AQM depending on the types of TCP
connections generated in the simulation. This serves as an existence
proof that our inability to perform experiments with realistic
synthetic traffic makes it difficult, if not impossible, to conclude
anything fundamental given the experimental procedures followed
at present in most network simulations.
This is a joint work with F. Hernandez-Campos,
A. Nobel, and F.D. Smith

Laurent
Massoulié (Microsoft Research Ltd.) lmassoul@microsoft.com
http://research.microsoft.com/users/lmassoul/
End-System
Emulation of Low-Priority Data Transport
Slides:
html
pdf
ps
ppt
This
talk is concerned with providing low priority data transfer
across wide-area networks. Although this is straightforward
when the underlying network supports service differentiation,
it becomes more challenging without such network support. We
will describe an application level approach to providing such
service differentiation in the current Internet, solely based
on receivers adapting their receive window. This approach is
motivated and analysed using fluid flow models and a utility
maximization framework, which allows us to identify resulting
rate allocations, and asymptotic stability, for candidate adaptation
strategies.
This is a joint work with Peter Key
and Bing Wang.

George Michailidis
(Department of Statistics, The University of Michigan) gmichail@umich.edu
Active Network Tomography: Design and Estimation Issues
(poster session)
Estimation of quality of service parameters, such as packet
loss rates and delay distributions, is of considerable importance
to network administrators and service providers. We consider
the active network tomography problem where the goal is to estimate
packet loss rates and delay distributions of all internal links
in a network from measurements obtained from nodes located on
its periphery. This is an example of a large-scale statistical
inverse problem. We introduce a new flexible class of probing
schemes, which is shown to be computationally efficient for
collecting the necessary data in global network monitoring problems.
Several methods of estimation for loss rates and delay distributions
will be described, including the use of EM-algorithms for the
Maximum Likelihood estimators and several classes of least-squares
estimates for loss rates. Some practical issues regarding the
design of probing experiments on different network topologies
and results from simulation studies will also be discussed.

David
M. Nicol
(Department of Electrical and Computer Engineering, University
of Illinois at Urbana-Champaign) nicol@crhc.uiuc.edu
Models
of Internet Worm Defense
Slides:
pdf
Internet
worms propagate by scanning IP address space, looking for vulnerable
hosts. On finding a susceptible host, the worm infects it, essentially
replicating itself, and the newly infected host begins scanning,
itself. Epidemic models have been used to describe worm propagation,
and have the attraction of capturing at a gross scale the pattern
of worm spread. We are interested both in modeling worm propagation,
and the effect of worm defenses on that propagation. We consider
models of both passive and active defenses (e.g. counter-worms),
with an eye towards comparing their effectiveness with respect
towards both the number of hosts ultimately infected, and the
overall impact on the network of scan behavior. We consider
models where the worm is slow enough so that network topology
does not matter, and models that explicitly account for topology,
bandwidth constraints, and failures of the infrastructure.

Robert
Nowak (University of Wisconsin-Madison) nowak@eceserv0.ece.wisc.edu
Network
Tomography from Multiple Sources
Slides: html
pdf
ps
ppt
The
problem of identifying network topology and inferring link-level
loss/delay parameters from end-to-end measurements is commonly
referred to as network tomography. This talk reviews the basic
principles of network tomography and describes a new approach
based on collaborative probing from multiple sources. Most work
in network tomography to date is based on probing a network
from a single source. However, using multiple sources can potentially
provide a more accurate and refined characterization of the
network. A novel probing strategy is proposed which utilizes
end-to-end packet arrival order and loss/delay metrics in order
to jointly estimate network topology and link-level parameters.
The theoretical performance of the estimator is studied via
an asymptotic analysis, and experiments demonstrate the potential
of our method in practical, small-sample settings.
This is a joint work with Michael Rabbat
and Mark Coates.

Andrew
M. Odlyzko
(Digital Technology Center, University of Minnesota) odlyzko@dtc.umn.edu
http://www.dtc.umn.edu/~odlyzko
The
Economics of the Internet
Slides:
html
pdf
ps
ppt
The Internet is governed by a multiplicity of feedback loops
operating on wildly varying time scales. Economics plays a key
role, as it affects decisions of users, network managers, and
financial decision makers. A survey of some of the main economic
factors, and how they are likely to influence the development
of the Internet, will be presented.
Jennifer
Rexford
(AT&T Labs-Research) jrex@research.att.com
Dynamics
of Hot-Potato Routing in IP Networks
Slides:
html
pdf
ps
ppt
Paper: pdf
The
separation of intradomain and interdomain routing is a key feature
of the Internet routing architecture. However, intradomain routing
protocols such as OSPF and IS-IS do have a (sometimes significant)
influence on the path-selection process in the Border Gateway
Protocol (BGP). This talk presents an analysis of the influence
of OSPF on BGP routing a large tier-1 ISP network. We propose
a general methodology for associating BGP update messages with
events visible in OSPF. Then, we apply our methodology to streams
of OSPF link-state advertisements and BGP update messages from
AT&T's domestic IP backbone. Our analysis shows that (i) "hot
potato" routing is sometimes a significant source of BGP updates,
(ii) BGP updates can lag 60 seconds behind the related OSPF
event, which can cause delays in forwarding-plane convergence,
(iii) OSPF-triggered BGP updates have a nearly uniform distribution
across destination prefixes, and (iv) the fraction of BGP messages
triggered by OSPF varies significantly across time and router
locations, with important implications on external monitoring
of BGP. We also describe how certain network designs and operational
practices increase the impact that internal OSPF events have
on BGP routing.
This
is a joint work with Renata Teixeira,
Aman Shaikh, and Tim
Griffin.
James
Roberts (France Télécom R&D) james.roberts@francetelecom.com
Flow Level Modelling and Control of IP Traffic
Sides:
pdf
It
proves most convenient to characterize and model IP traffic
at flow level. We distinguish streaming flows, where the network
requirement is to preserve the transmitted signal by limiting
packet loss and delay, and elastic flows, where the requirement
is to transmit a quantity of data as fast as possible. In the
talk we will review recent results from stochastic flow level
models for elastic traffic and for a mixture of elastic and
streaming traffic. Of particular interest are the conditions
under which performance measures are insensitive to precise
traffic characteristics.
The
modelling results enable an understanding of how performance
targets can be realizable and provide guidelines for network
sizing. They also suggest that current proposals for QoS differentiation
do not take sufficient account of the statistical nature of
traffic. This leads us to propose an alternative flow-aware
architecture based on the joint use of per-flow admission control
and fair queueing in routers. The resulting enhanced best effort
network meets the respective performance requirements of streaming
and elastic flows without the need for explicit packet marking.
We describe the proposed router architecture called Cross-protect
and discuss scalability issues.

Sanjay Shakkottai (Department
of ECE, University of Texas at Austin) shakkott@ece.utexas.edu
Time-scale Decomposition and Equivalent Rate Based Marking
(poster session)
Differential equation models for Internet congestion control algorithms
have been widely used to understand network dynamics and the design
of router algorithms. These models use a fluid approximation for
user data traffic, and describe the dynamics of the router queue
and user adaptation through coupled differential equations.
We show that the randomness due to short and
unresponsive flows in the Internet is sufficient to decouple
the dynamics of the router queues from those of the end controllers.
This implies that a time-scale decomposition naturally occurs
such that the dynamics of the router manifest only through their
statistical steady-state behavior.
The interaction between the routers and flows
occur through marking, where routers indicate congestion by
appropriately marking packets during congestion. We show that
the time-scale decomposition implies that a queue-length based
marking function such as Random Early Detection (RED) or Random
Exponential Marking (REM) have an equivalent form which depend
only on the data arrival rate from the end-systems and do not
depend on the queue dynamics. This leads to much simpler dynamics
of the differential equation models (there is no queueing dynamics
to consider), which enables easier simulation (the state space
is reduced) and analysis.
This is a joint work with
Yung Yi and Supratim Deb
Yuval
Shavitt
(Department of Electrical Engineering - Systems, Tel Aviv University,
Tel Aviv 69978, Israel) shavitt@eng.tau.ac.il
http://www.eng.tau.ac.il/~shavitt
Internet
Dynamics: Missing Data, Missing Models
In
the recent years there were several passive and active measurement
efforts that attempted to acquire an accurate connectivity map
of the Internet. I will argue that none reached the point that
allows us to safely determine what the Internet AS graph looks
like. Even the definition of what constitutes a connection between
two ASs is debatable, I will suggest one myself.
However,
an accurate topology description by itself is not sufficient
to explain many dynamic aspects of the Internet operation and
we need to develop models that combine topology, constraints,
and network dynamics. In this talk I will introduce our vision
for creating such a model, discuss one simple example of the
Internet resiliency to attacks and failures, and hint at some
of our future work in this direction.
R.
Srikant (Coordinated Science Lab. and Department
of Electrical and Computer Engineering, University of Illinois,
Urbana-Champaign) srikant@ifp.uiuc.edu
http://comm.csl.uiuc.edu/~srikant
Congestion
Control in Overlay Networks
Slides: html
pdf
ps
ppt
Recently,
there has been much interest in overlaying the Internet with
application-level routers to improve network performance. Such
overlay networks would allow sources to perform multi-path routing.
In the first part of the talk, following Kelly, Mauloo and Tan
and Vinnicombe, we will discuss the design of stable congestion
control algorithms to take advantange of multi-path routing
in overlay networks.
Deterministic
models of congestion control capture the congestion indication
mechanisms at the routers in two different ways: rate-based
models, where the queue-length at the router does not explicitly
appear in the model, and queue-based models, where the queue
length at the router is explicitly a part of the model. In the
second part of the talk, we will present some conjectures on
how parameter choices in active management mechanisms could
lead to these very different models.
The
first Part is a joint work with Huaizhong
Han, Chris Hollot, Srinivas
Shakkottai and Don Towsley.
The second Part is a joint work with Supratim
Deb.
Donald Towsley (Department
of Computer Science, University of Massachusetts) towsley@newworld.cs.umass.edu
http://www-net.cs.umass.edu/personnel/towsley.html
Analysis and Detection of Internet Worms
Slides:
pdf
In recent years, fast spreading worms have become major threats
to the security of the Internet. In order to defend against future
worms, it is important to understand how they propagate and how
different scanning strategies affect their propagation. In this
talk, we analyze worm propagation behavior under various scanning
strategies, such as idealized scan, uniform scan, divide-and-conquer
scan, local preference scan, sequential scan,etc. We also address
the problem of worm detection. Based on the premise that one should
look for the exponential growth trend, we develop Kalman filters
to detect the propagation of a worm at an early stage. Last, we
address some of the issues that arise in applying this technique
to different scanning strategies.
This is a joint work with W. Gong and C. Zou.
Darryl
Veitch
(University of Melbourne, currently on sabbatical at Sprint
ATL) dveitch@sprintlabs.com Homepage
in Melbourne: http://www.cubinlab.ee.mu.oz.au/~darryl
Synchronising
Software Clocks on the Internet
Slides: pdf
Accurate
software clocks in PCs which can be reliably and inexpensively
synchronised are essential for many aspects of networking, including
active probing based network measurement and many real-time
network applications. Best effort solutions using existing PC
software clocks synchronised with the standard Network Time
Protocol (NTP) algorithms are not robust enough, nor accurate
enough for many purposes, whereas GPS based synchronisation
is money and effort intensive. In this talk a CPU clock counter
(TSC register) based software clock will be described which
has many intrinsic advantages, thanks to the high performance
of modern off the shelf hardware. Principles and algorithms
enabling a robust, accurate synchronisation based on the existing
NTP server network will be described, and illustrated using
4 months of real data collected in 4 different host-server environments.
The result is an alternative remote-synchronised software clock
with substantially enhanced performance.
Bill
Woodcock
(Research Director, Packet Clearing House) woody@pch.net
Design
and Deployment of a Global Data-Collection Infrastructure
Bill
will discuss Packet Clearing House's experiences in their last
ten years of collecting data about Internet topology, focusing
specifically on the practical aspects of constructing a supportable,
global-scale data collection platform. PCH is a research institute
and development aid agency, which has worked to support both
the academic and operational communities' needs for robust and
instrumented Internet infrastructure. PCH's data-collection
and experiment-hosting platforms are present at more than thirty
Internet exchanges around the world, including both large regional
exchanges in cities like Palo Alto, Tokyo, London, and Johannesburg,
and smaller local exchanges in Kathmandu, Dar es Salaam, Dhaka,
and Wellington. These are collectively managed instances of
an identical installation package, and participate in the global
BGP routing system using unicast IPv4, IPv6, anycast, and multicast
routing.
Bill
Woodcock
(Research Director, Packet Clearing House) woody@pch.net
Interaction
between Peering, Routing, Topology and Economics
Bill
will discuss ways of differentiating between peering and transit
from public datasets, and the relationships between economics
and the topology of the global network as it evolves.
Group
Photos Material
from Talks
Probability
and Statistics in Complex Systems: Genomics, Networks, and Financial
Engineering, September 1, 2003 - June 30, 2004
|