IMA Tutorial (part II):
Measurement and modeling of the web and related data sets

Setup

Context

Focus Areas

One view of the Internet: Inter-Domain Connectivity

Another view of the web: the hyperlink graph

Getting started – structure at the hyperlink level

Terminology

Data

Breadth-first search from random starts

A Picture of (~200M) pages.

Some distance measurements

Facts (about the crawl).

Analysis of power law

Component sizes.

Other observed power laws in the web

More Characterization: Self-Similarity

Ways to Slice the Web

Self-Similarity on the Web

In particular…

Is this surprising?

A structural explanation

The Navigational Backbone

Information Extraction from Large Graphs

Overview

Many approaches to this problem

General approach

Web Communities

Web Communities

Communities and cores

Other footprint structures

Subgraph enumeration

Enumerating cores

Results for cores

The cores are interesting

Elementary Schools in Japan

So…

A word on evolution

A word on evolution

Example

More bursts

Integrating bursts and graph analysis