Tutorial: The Data Stream Model

Thursday, February 16, 2012 - 9:00am - 10:00am
Keller 3-180
David Woodruff (IBM Research Division)
The data stream model has emerged as a way of analyzing algorithmic efficiency in the presence of massive data sets. Typically the algorithm is allowed a few (usually one) passes over the input, and must use limited memory and have very fast per input item processing time. I will give a survey of algorithms and lower bounds in this area, with an emphasis on problems such as norm estimation, numerical linear algebra, empirical entropy, l_p-sampling, and heavy hitters. Time-permitting I'll also discuss the extension to functional monitoring, in which there are multiple sites each with a data stream and a central coordinator should continuously maintain a statistic over the union of the data streams.
MSC Code: