Privacy and Reproducibility in Data Science
Friday, September 16, 2016 - 11:10am - 12:00pm
Exploratory data analysis is fun but dangerous. Observations alone, no matter how many, can rarely justify causal inferences. Simple calculations show that, even playing strictly by the current rules of empirical science, a shocking percentage of the conclusions reached will be wrong. Those same calculations show that reproducing hypothesis tests can make them much more reliable. The Sloan Foundation actively supports efforts to make empirical research more reproducible, including the development of mathematical approaches to privacy-preserving research. Recent and surprising theorems show how, even if privacy is not an issue, some of the techniques developed to protect confidential information can also protect against false discovery due to multiple hypothesis testing and exploratory data analysis.