Statistics for Re-Identification in Network Models

Tuesday, February 28, 2012 - 12:15pm - 12:30pm
Keller 3-180
Shawndra Hill (University of Pennsylvania)
Re-identification is the process of matching records or behaviors that belong to the same individual, sometimes when the individual is acting anonymously. The ability to re-identify individuals from their social network behavior-their interactions with others on a social network, has many real-world implications in areas such as fraud detection, online target marketing, and author attribution. We considered statistics for re-identication in social network data from three popular network models: Erdös-Rényi, Small World and Scale Free. Many researchers who have worked on these statistical physics models, while cognizant of the inherent stochasticity of the problem, have inadequately addressed statistical estimation and inference. We view re-identication, in this setting, as hypotheses tests of network similarity modulo a network data model. In this paper, we offer a formal statistical framework for re-identication, using first principles and the algorithmic specification of these models. Using our framework, we illustrate the method and its performance on three network data examples: simulations, the Enron emails, and a telecommunications dataset.