# Special Seminar: Some Statistical Problems Related to Big Data

Thursday, October 22, 2015 - 3:00pm - 4:30pm

Rapson 31

Boaz Nadler (Weizmann Institute of Science)

Joint seminar with UMN School of Statistics. Social hour at 3 p.m. with seminar beginning at 3:30 p.m. in Rapson 31 (new location).

The era of big-data raises several statistical and computational challenges.

In this talk we will discuss two such challenges, motivated in part by specific applications.

Challenge i) Statistical Losses in Distributed Learning and Inference. In various applications the data is so large that it does not fit a single machine or processing it on a single machine might be too slow. Motivated by the popularity of the map-reduce scheme, we study what are the statistical losses of distributed machine learning, In particular, we analyze the case where data is randomly distributed among m machines, each computes its own estimator, which is then sent to a central node for merging.

Challenge ii) How well can one perform a learning or inference task under severe computational constraints that don't even allow processing all of the collected data? We study this problem in the context of edge detection in noisy images. We will present possibly the first sub-linear time edge detection algorithm and analyze its inherent trade-off between the computational budget and the detection performance.

The era of big-data raises several statistical and computational challenges.

In this talk we will discuss two such challenges, motivated in part by specific applications.

Challenge i) Statistical Losses in Distributed Learning and Inference. In various applications the data is so large that it does not fit a single machine or processing it on a single machine might be too slow. Motivated by the popularity of the map-reduce scheme, we study what are the statistical losses of distributed machine learning, In particular, we analyze the case where data is randomly distributed among m machines, each computes its own estimator, which is then sent to a central node for merging.

Challenge ii) How well can one perform a learning or inference task under severe computational constraints that don't even allow processing all of the collected data? We study this problem in the context of edge detection in noisy images. We will present possibly the first sub-linear time edge detection algorithm and analyze its inherent trade-off between the computational budget and the detection performance.