HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
Measurement, Modeling and Analysis of the Internet
January 12-16, 2004

Probability and Statistics in Complex Systems: Genomics, Networks, and Financial Engineering, September 1, 2003 - June 30, 2004

François Baccelli (INRIA-ENS, Ecole Normale Superieure)  Francois.Baccelli@ens.fr  http://www.di.ens.fr/~baccelli/

Mean-Field Interaction Models for Large TCP Networks
Slides:   pdf

This presentation will review various dynamical interaction models allowing one to analyze the throughputs obtained by a large collection of TCP controlled flows sharing many links and routers, from the sole knowledge of the network parameters (capacity, buffer sizes, topology) and of the characteristics of each flow (RTT, route, etc.).

In the droptail case, the mean-field limit can be described geometrically as a billiards in the Euclidean space. This billiards has as many dimensions as the number of flow classes and as many reflection facets as there are routers and links. This allows one to determine the possible stationary behaviors of the interacting flows and provides new ways of assessing TCP's fairness.

The RED case can also be investigated by such mean-field techniques. In the single link case, this allows one to determine in closed form the stationary distribution of the stationary throughputs obtained by the flows.

When aggregated, the traffic generated by these models exhibits TCP and network-induced fluctuations that will be compared to statistical properties observed on real traces.

Andre Broido (CAIDA)  broido@caida.org

Applications of network spectroscopy

Interpretation of spectra is practiced since Noah's time. Quantum mechanics, chemistry, geophysics and communication owe it many of their successes. Spectroscopy paradigm is now becoming popular among Internet scientists. Properties of sources, links and switches (e.g. Layer 2 techologies, link bitrates, paths, OSes) can be inferred from periods and quantizations of packet timings and header fields. Network conditions from traffic congestion to DDoS attacks are studied with these and related techniques of spectral analysis.

In this talk, we give an overview of results obtained by other groups working in the field. We then describe two applications that identify network properties via delay spectra. In one case, we prove that spurious DNS updates at root and blackhole servers come from the Microsoft Win2000 DNS implementation. The method is based on a binary autocorrelation of interarrival times. In another case, we apply spectroscopy to the bitrate estimation of an end-user access in DSL and cable infrastructures. We use entropy minimization for Radon transformed marginals of the delay distributions and packet arrival times and find delay quanta that identify provisioned bitrates.
This is a joint work with Evi Nemeth, Ryan King, and K.C. Claffy

Mark Coates (Department of Electrical and Computer Engineering, McGill University)  coates@tsp.ece.mcgill.ca

Deterministic Packet Marking for Congestion Price Estimation (poster session)

Several recent price-based congestion control schemes require relatively accurate path price estimates for successful operation. The proposed addition of the two-bit Explicit Congestion Notification (ECN) field in the IP header provides routers with a mechanism for conveying price information. Recently, two proposals have emerged for probabilistic packet marking at the routers; the proposals allow receivers to estimate path price from the fraction of marked packets. In this poster we introduce an alternative deterministic marking scheme for encoding path price. Under our approach, each router quantizes the price of its outgoing link to a fixed number of bits. We then make use of the IP identification (IPid) field to map data packets to different probe types, and each probe type calculates a partial sum of the path price bits. A router deduces its marking behaviour according to the IPid and the TTL (Time To Live) field of each packet. We evaluate the performance of our algorithm in terms of its error in representing the end-to-end price, and compare it to probabilistic marking. We show that based on empirical Internet traffic characteristics, our algorithm performs better when estimating path price using small blocks of packets.

Mark Crovella (Department of Computer Science, Boston University; currently at Laboratoire d'Informatique Paris VI (LIP6))  crovella@cs.bu.edu  http://www.cs.bu.edu/faculty/crovella

Structural Analysis of Network Traffic Flows
Slides:   pdf

Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties. In this talk, I will present the first analysis of complete sets of OD flow timeseries, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we have found that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately captured in time using a small number (10 or less) of independent components or dimensions.

I will then show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. Such a decomposition provides insight into how the various constituents contribute to the overall structure of OD flows. Finally, I will explore the extent to which this decomposition varies over time.

This is a joint work with Anukool Lakhina, Konstantina Papagiannaki, Christophe Diot, Eric D. Kolaczyk and Nina Taft.

Wanyang Dai (Department of Mathematics, Nanjing University)  nan5lu8@netra.nju.edu.cn

Networks with Multicast Services and Multitype Input Sources (poster session)

Multicast and differentiated services are among the most important communication operations and are highly demanded in integrated services packet networks like Internet. When multicast operations are involved, network internal traffic sources generated by multiple copies of a packet in a switch (or a router) and multi-routing requirements of the copies will largely increase the complexity of network performance modeling, analysis, control and scheduling. How to suitably model multicast operations will be a key step in resolving all of the related problems. A way will be presented to address this issue for a network involving both multitype external input sources and shared-memory multicast switches. Issues in general settings and research topics along this direction will be raised.

Nicholas G. Duffield (AT&T Labs-Research)  duffield@research.att.com  http://www.research.att.com/info/duffield

Challenges for Using Sampled Traffic Measurements

Traffic measurements are increasingly sampled due to ever growing line rates and concomitant traffic volumes. On the other hand, measurement-based applications increasingly depend on fine grained traffic characterization. Can these applications work effectively with existing sampled measurements? And if not, can we better match sampling techniques to applications? This talk describes the challenges and limitations for using sampled traffic measurements, and some recent approaches that move beyond traffic sampling methods in predominant use today.

Sally Floyd (ICSI)  floyd@icir.org  http://www.icir.org/floyd/

Internet Research Needs a Critical Perspective Towards Models
Slides:    html    pdf    ps    ppt

The underlying modeling assumptions used in analysis, simulations, and experiments can have a huge effect on the results of Internet research. However, as a community we often do not understand how the underlying modeling assumptions affect results for the specific research issues under investigation. Further, in a diverse and fast-changing Internet, we often don't understand which modeling assumptions are relevant for the current Internet, much less for the future Internet in which our newly-proposed protocols will find themselves. This talk does not provide answers, but argues for a more critical evaluation of network models as a key issue in network research.

Mor Harchol-Balter (School of Computer Science, Carnegie Mellon University)  harchol@cs.cmu.edu

Improving Response Times for Static and Dynamic Web Requests (poster session)

We consider how to improve client response times at web sites for both static and dynamic web requests.

For the case of static "Get FILE" requests, we propose a simple solution: implementing Shortest-Remaining-Processing-Time (SRPT) scheduling in the Linux kernel. We demonstrate SRPT scheduling improves the expected response time of *every* request [TOCS03, USITS99]. The result is based on analytical work [SIGMETRICS01, SIGMETRICS03] which proves that, counter-to-intuition, helping short requests does not necessarily result in hurting long requests. We also show that SRPT scheduling is particularly effective at combatting transient overload [ITC03].

For dynamic requests (e.g., e-Commerce, online shopping) response times can be much higher (tens of seconds), due to competition between requests in the backend database. Here we implement preemptive priority scheduling of the database lock queues, providing very low response times for high-priority customers (big spenders), while hardly penalizing low-priority customers [ICDE04, SIGMOD04-in-submission].

FOR REFERENCES SEE: www.cs.cmu.edu/~harchol/.

Kevin Jeffay (Departments of Computer Science & Statistics, University of North Carolina at Chapel Hill)  jeffay@cs.unc.edu

A Non-Parametric Approach to Generation and Validation of Synthetic Network Traffic (poster session)

In order to perform valid experiments, network simulators and testbeds require source-level traffic generators that correspond to valid, contemporary models of application or user behavior. Unfortunately, the Internet is evolving far more rapidly than our present ability to understand and model the mix and use of applications that account for the majority of bytes transferred on the Internet. We describe a method that automatically generates a source-level model of the mix of TCP applications found on a network link, and show that this model is suitable for generating synthetic traffic in a simulation that is statistically equivalent to traffic in the measured network. The method is novel in that it constructs the model in a non-parametric manner, without any knowledge of the applications present in the network. We validate the method empirically uses traces from our institution and Abilene backbones. The importance of having a method that allows one to faithfully reproduce the source-level characteristics of actual network flows in experiments is illustrated via a case study of active queue management mechanisms (AQM). We demonstrate that dramatically different, and contradictory, conclusions are reached regarding the effectiveness of AQM depending on the types of TCP connections generated in the simulation. This serves as an existence proof that our inability to perform experiments with realistic synthetic traffic makes it difficult, if not impossible, to conclude anything fundamental given the experimental procedures followed at present in most network simulations.

This is a joint work with F. Hernandez-Campos, A. Nobel, and F.D. Smith

Laurent Massoulié (Microsoft Research Ltd.)  lmassoul@microsoft.com  http://research.microsoft.com/users/lmassoul/

End-System Emulation of Low-Priority Data Transport
Slides:   html    pdf    ps    ppt

This talk is concerned with providing low priority data transfer across wide-area networks. Although this is straightforward when the underlying network supports service differentiation, it becomes more challenging without such network support. We will describe an application level approach to providing such service differentiation in the current Internet, solely based on receivers adapting their receive window. This approach is motivated and analysed using fluid flow models and a utility maximization framework, which allows us to identify resulting rate allocations, and asymptotic stability, for candidate adaptation strategies.

This is a joint work with Peter Key and Bing Wang.

George Michailidis (Department of Statistics, The University of Michigan)  gmichail@umich.edu

Active Network Tomography: Design and Estimation Issues  (poster session)

Estimation of quality of service parameters, such as packet loss rates and delay distributions, is of considerable importance to network administrators and service providers. We consider the active network tomography problem where the goal is to estimate packet loss rates and delay distributions of all internal links in a network from measurements obtained from nodes located on its periphery. This is an example of a large-scale statistical inverse problem. We introduce a new flexible class of probing schemes, which is shown to be computationally efficient for collecting the necessary data in global network monitoring problems. Several methods of estimation for loss rates and delay distributions will be described, including the use of EM-algorithms for the Maximum Likelihood estimators and several classes of least-squares estimates for loss rates. Some practical issues regarding the design of probing experiments on different network topologies and results from simulation studies will also be discussed.

David M. Nicol (Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign)  nicol@crhc.uiuc.edu

Models of Internet Worm Defense
Slides:  pdf

Internet worms propagate by scanning IP address space, looking for vulnerable hosts. On finding a susceptible host, the worm infects it, essentially replicating itself, and the newly infected host begins scanning, itself. Epidemic models have been used to describe worm propagation, and have the attraction of capturing at a gross scale the pattern of worm spread. We are interested both in modeling worm propagation, and the effect of worm defenses on that propagation. We consider models of both passive and active defenses (e.g. counter-worms), with an eye towards comparing their effectiveness with respect towards both the number of hosts ultimately infected, and the overall impact on the network of scan behavior. We consider models where the worm is slow enough so that network topology does not matter, and models that explicitly account for topology, bandwidth constraints, and failures of the infrastructure.

Robert Nowak (University of Wisconsin-Madison)  nowak@eceserv0.ece.wisc.edu

Network Tomography from Multiple Sources
Slides:   html    pdf    ps    ppt

The problem of identifying network topology and inferring link-level loss/delay parameters from end-to-end measurements is commonly referred to as network tomography. This talk reviews the basic principles of network tomography and describes a new approach based on collaborative probing from multiple sources. Most work in network tomography to date is based on probing a network from a single source. However, using multiple sources can potentially provide a more accurate and refined characterization of the network. A novel probing strategy is proposed which utilizes end-to-end packet arrival order and loss/delay metrics in order to jointly estimate network topology and link-level parameters. The theoretical performance of the estimator is studied via an asymptotic analysis, and experiments demonstrate the potential of our method in practical, small-sample settings.

This is a joint work with Michael Rabbat and Mark Coates.

Andrew M. Odlyzko (Digital Technology Center, University of Minnesota)  odlyzko@dtc.umn.edu  http://www.dtc.umn.edu/~odlyzko

The Economics of the Internet
Slides:   html    pdf    ps    ppt

The Internet is governed by a multiplicity of feedback loops operating on wildly varying time scales. Economics plays a key role, as it affects decisions of users, network managers, and financial decision makers. A survey of some of the main economic factors, and how they are likely to influence the development of the Internet, will be presented.

Jennifer Rexford (AT&T Labs-Research)  jrex@research.att.com

Dynamics of Hot-Potato Routing in IP Networks
Slides:   html    pdf    ps    ppt
Paper:   pdf

The separation of intradomain and interdomain routing is a key feature of the Internet routing architecture. However, intradomain routing protocols such as OSPF and IS-IS do have a (sometimes significant) influence on the path-selection process in the Border Gateway Protocol (BGP). This talk presents an analysis of the influence of OSPF on BGP routing a large tier-1 ISP network. We propose a general methodology for associating BGP update messages with events visible in OSPF. Then, we apply our methodology to streams of OSPF link-state advertisements and BGP update messages from AT&T's domestic IP backbone. Our analysis shows that (i) "hot potato" routing is sometimes a significant source of BGP updates, (ii) BGP updates can lag 60 seconds behind the related OSPF event, which can cause delays in forwarding-plane convergence, (iii) OSPF-triggered BGP updates have a nearly uniform distribution across destination prefixes, and (iv) the fraction of BGP messages triggered by OSPF varies significantly across time and router locations, with important implications on external monitoring of BGP. We also describe how certain network designs and operational practices increase the impact that internal OSPF events have on BGP routing.

This is a joint work with Renata Teixeira, Aman Shaikh, and Tim Griffin.

James Roberts (France Télécom R&D)  james.roberts@francetelecom.com

Flow Level Modelling and Control of IP Traffic
Sides:   pdf

It proves most convenient to characterize and model IP traffic at flow level. We distinguish streaming flows, where the network requirement is to preserve the transmitted signal by limiting packet loss and delay, and elastic flows, where the requirement is to transmit a quantity of data as fast as possible. In the talk we will review recent results from stochastic flow level models for elastic traffic and for a mixture of elastic and streaming traffic. Of particular interest are the conditions under which performance measures are insensitive to precise traffic characteristics.

The modelling results enable an understanding of how performance targets can be realizable and provide guidelines for network sizing. They also suggest that current proposals for QoS differentiation do not take sufficient account of the statistical nature of traffic. This leads us to propose an alternative flow-aware architecture based on the joint use of per-flow admission control and fair queueing in routers. The resulting enhanced best effort network meets the respective performance requirements of streaming and elastic flows without the need for explicit packet marking. We describe the proposed router architecture called Cross-protect and discuss scalability issues.

Sanjay Shakkottai (Department of ECE, University of Texas at Austin)  shakkott@ece.utexas.edu

Time-scale Decomposition and Equivalent Rate Based Marking (poster session)

Differential equation models for Internet congestion control algorithms have been widely used to understand network dynamics and the design of router algorithms. These models use a fluid approximation for user data traffic, and describe the dynamics of the router queue and user adaptation through coupled differential equations.

We show that the randomness due to short and unresponsive flows in the Internet is sufficient to decouple the dynamics of the router queues from those of the end controllers. This implies that a time-scale decomposition naturally occurs such that the dynamics of the router manifest only through their statistical steady-state behavior.

The interaction between the routers and flows occur through marking, where routers indicate congestion by appropriately marking packets during congestion. We show that the time-scale decomposition implies that a queue-length based marking function such as Random Early Detection (RED) or Random Exponential Marking (REM) have an equivalent form which depend only on the data arrival rate from the end-systems and do not depend on the queue dynamics. This leads to much simpler dynamics of the differential equation models (there is no queueing dynamics to consider), which enables easier simulation (the state space is reduced) and analysis.

This is a joint work with Yung Yi and Supratim Deb

Yuval Shavitt (Department of Electrical Engineering - Systems, Tel Aviv University, Tel Aviv 69978, Israel)  shavitt@eng.tau.ac.il  http://www.eng.tau.ac.il/~shavitt

Internet Dynamics: Missing Data, Missing Models

In the recent years there were several passive and active measurement efforts that attempted to acquire an accurate connectivity map of the Internet. I will argue that none reached the point that allows us to safely determine what the Internet AS graph looks like. Even the definition of what constitutes a connection between two ASs is debatable, I will suggest one myself.

However, an accurate topology description by itself is not sufficient to explain many dynamic aspects of the Internet operation and we need to develop models that combine topology, constraints, and network dynamics. In this talk I will introduce our vision for creating such a model, discuss one simple example of the Internet resiliency to attacks and failures, and hint at some of our future work in this direction.

R. Srikant (Coordinated Science Lab. and Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign)  srikant@ifp.uiuc.edu  http://comm.csl.uiuc.edu/~srikant

Congestion Control in Overlay Networks
  html    pdf    ps    ppt

Recently, there has been much interest in overlaying the Internet with application-level routers to improve network performance. Such overlay networks would allow sources to perform multi-path routing. In the first part of the talk, following Kelly, Mauloo and Tan and Vinnicombe, we will discuss the design of stable congestion control algorithms to take advantange of multi-path routing in overlay networks.

Deterministic models of congestion control capture the congestion indication mechanisms at the routers in two different ways: rate-based models, where the queue-length at the router does not explicitly appear in the model, and queue-based models, where the queue length at the router is explicitly a part of the model. In the second part of the talk, we will present some conjectures on how parameter choices in active management mechanisms could lead to these very different models.

The first Part is a joint work with Huaizhong Han, Chris Hollot, Srinivas Shakkottai and Don Towsley. The second Part is a joint work with Supratim Deb.

Donald Towsley (Department of Computer Science, University of Massachusetts)  towsley@newworld.cs.umass.edu  http://www-net.cs.umass.edu/personnel/towsley.html

Analysis and Detection of Internet Worms
Slides:   pdf

In recent years, fast spreading worms have become major threats to the security of the Internet. In order to defend against future worms, it is important to understand how they propagate and how different scanning strategies affect their propagation. In this talk, we analyze worm propagation behavior under various scanning strategies, such as idealized scan, uniform scan, divide-and-conquer scan, local preference scan, sequential scan,etc. We also address the problem of worm detection. Based on the premise that one should look for the exponential growth trend, we develop Kalman filters to detect the propagation of a worm at an early stage. Last, we address some of the issues that arise in applying this technique to different scanning strategies.

This is a joint work with W. Gong and C. Zou.

Darryl Veitch (University of Melbourne, currently on sabbatical at Sprint ATL) dveitch@sprintlabs.com  Homepage in Melbourne: http://www.cubinlab.ee.mu.oz.au/~darryl

Synchronising Software Clocks on the Internet

Accurate software clocks in PCs which can be reliably and inexpensively synchronised are essential for many aspects of networking, including active probing based network measurement and many real-time network applications. Best effort solutions using existing PC software clocks synchronised with the standard Network Time Protocol (NTP) algorithms are not robust enough, nor accurate enough for many purposes, whereas GPS based synchronisation is money and effort intensive. In this talk a CPU clock counter (TSC register) based software clock will be described which has many intrinsic advantages, thanks to the high performance of modern off the shelf hardware. Principles and algorithms enabling a robust, accurate synchronisation based on the existing NTP server network will be described, and illustrated using 4 months of real data collected in 4 different host-server environments. The result is an alternative remote-synchronised software clock with substantially enhanced performance.

Bill Woodcock (Research Director, Packet Clearing House)  woody@pch.net

Design and Deployment of a Global Data-Collection Infrastructure

Bill will discuss Packet Clearing House's experiences in their last ten years of collecting data about Internet topology, focusing specifically on the practical aspects of constructing a supportable, global-scale data collection platform. PCH is a research institute and development aid agency, which has worked to support both the academic and operational communities' needs for robust and instrumented Internet infrastructure. PCH's data-collection and experiment-hosting platforms are present at more than thirty Internet exchanges around the world, including both large regional exchanges in cities like Palo Alto, Tokyo, London, and Johannesburg, and smaller local exchanges in Kathmandu, Dar es Salaam, Dhaka, and Wellington. These are collectively managed instances of an identical installation package, and participate in the global BGP routing system using unicast IPv4, IPv6, anycast, and multicast routing.

Bill Woodcock (Research Director, Packet Clearing House)  woody@pch.net

Interaction between Peering, Routing, Topology and Economics

Bill will discuss ways of differentiating between peering and transit from public datasets, and the relationships between economics and the topology of the global network as it evolves.

Connect With Us: