User Data: The End of Anonymity, the Beginning of Privacy

Wednesday, May 9, 2012 - 10:30am - 11:30am
Keller 3-180
Vitaly Shmatikov (The University of Texas at Austin)
We do not collect personally identifiable information... This dataset
have been de-identified prior to release... From advertisers tracking Web
clicks to biomedical researchers sharing clinical records, anonymization
is the main privacy protection mechanism used for sensitive user data

I will argue that the distinction between personally identifiable and
non-personally identifiable information is fallacious by showing how to
infer private information from fully anonymized data in three settings:
(1) records of individual transactions and preferences, illustrated by the
Netflix Prize dataset, (2) social networks, and (3) recommender systems,
where temporal changes in aggregate statistics allow accurate inference
of hidden individual transactions.

I will then outline a program for data privacy research. It includes
several challenging problems in the design and implementation of
privacy-preserving systems, domain-specific algorithmic research,
as well as policy and economic issues.
MSC Code: