Tutorial - Network Science and the Allure of Big Data: Internet Connectivity as a Case Study

Monday, February 27, 2012 - 9:00am - 10:00am
Keller 3-180
Walter Willinger (AT&T Laboratories - Research)
Network science prides itself on the fact that many of the new ideas that
have contributed to its enormous popularity are based on (big) data and meticulous observations. Unfortunately, an inconvenient truth about most
large-scale, real-world, and highly-engineered or highly-evolved systems is
that the many things we can and do measure about these systems are generally
not the quantities we want to measure. Largely unwilling to deal with this
inconvenient truth, network science has more or less succeeded in making this discrepancy a non-issue by labeling scientific tasks such as assessing data quality or ensuring data hygiene as unnecessary, small-minded, and even
counter-productive for scientific discovery.

The Internet is a prime example of a large-scale and highly-engineered
real-world system where the available observations are everything but meticulous
and where ignoring this inconvenient truth has resulted in new models, theories,
and predictions that - despite their general appeal and headline-grabbing
nature - have nothing to do with reality and quickly collapse when scrutinized with carefully vetted measurements and readily available domain knowledge. Using a number of widely-used and easily-available datasets of different
types of Internet connectivity measurements, I will illustrate in this talk
what it means to get to know your data (i.e., assessing its quality) and to develop a network science that is serious about big data and adamant about
its proper use for scientific discovery.