IMA Tutorial (part
II):
Measurement and modeling of the web and related data sets
Setup
Context
Focus Areas
One view of the Internet:
Inter-Domain Connectivity
Another view of the web: the
hyperlink graph
Getting started – structure
at the hyperlink level
Terminology
Data
Breadth-first search from
random starts
A Picture of (~200M) pages.
Some distance measurements
Facts (about the crawl).
Analysis of power law
Component sizes.
Other observed power laws in
the web
More Characterization:
Self-Similarity
Ways to Slice the Web
Self-Similarity on the Web
In particular…
Is this surprising?
A structural explanation
The Navigational Backbone
Information Extraction from
Large Graphs
Overview
Many approaches to this
problem
General approach
Web Communities
Web Communities
Communities and cores
Other footprint structures
Subgraph enumeration
Enumerating cores
Results for cores
The cores are interesting
Elementary Schools in Japan
So…
A word on evolution
A word on evolution
Example
More bursts
Integrating bursts and graph
analysis