# Poster Session and Reception

Wednesday, September 14, 2016 - 5:15pm - 6:30pm

Lind 400

**Communication Efficient Distributed Kernel Principal Component Analysis**

Yingyu Liang (Princeton University)

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?

In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, the communication cost is independent of the number of data points and almost matches the lower bound for any constant accuracy. Furthermore, we experimented the algorithm with large-scale real world datasets and showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches.**Tracking Storms from Space: A Data Science Approach**

Ardeshir Ebtehaj (University of Minnesota, Twin Cities)

This poster presents a data science approach for passive microwave retrieval of precipitation from space. An algorithm is presented [called Shrunken locally linear embedding Algorithm for Retrieval of Precipitation (ShARP)], which relies on a sparsity promoting regularization and makes use of two joint dictionaries of coincident rainfall profiles and their corresponding upwelling spectral brightness temperatures. A sequential detection–estimation strategy is adopted, which assumes that similar rainfall intensity values and their spectral radiances live close to some sufficiently smooth manifolds with analogous local geometry. The detection step employs a nearest neighbor classification rule, whereas the estimation scheme is equipped with a constrained shrinkage estimator to ensure the stability of retrieval and some physical consistency. The algorithm is examined using coincident observations of the active precipitation radar and the passive microwave imager on board the NASA’s TRMM satellite. We focus on a radiometrically complex region, partly covering the Tibetan highlands, Himalayas, and Ganges– Brahmaputra–Meghna River basins, which is unique in terms of its diverse land surface radiation regime and precipitation types. Promising results are presented using ShARP over snow-covered land surfaces and in the vicinity of coastlines, in comparison with the standard TRMM passive precipitation product. The results show that modern data science approaches can significantly reduce the rainfall overestimation due to the background snow contamination and markedly improve detection and retrieval of rainfall in the vicinity of coastlines. During the calendar year 2013, compared to the standard products, it is demonstrated that over the study domain the root-mean-squared error can be reduced up to 38% annually, while the improvement can reach up to 70% during the cold months of the year.**Random Matrices with Heavy-tailed Entries: Sub-Gaussian Mean Estimators and Applications**

Stas Minsker (University of Southern California)

Estimation of covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. Huber wrote in 1964, “...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance…”

Motivated by Tukey's question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will present key ideas behind our construction, as well as applications to covariance estimation and matrix completion problems.**Global Monitoring of Surface Water Dynamics: A Data-Driven Approach**

Anuj Karpatne (University of Minnesota, Twin Cities)

Freshwater, which is only available in inland water bodies such as lakes, reservoirs, and rivers, is increasingly becoming scarce across the world and this scarcity is posing a global threat to human sustainability. A system for global monitoring of surface water bodies can be useful for policy-makers and the scientific community to effectively manage water resources and understand the interplay between water dynamics, climate change, and human actions. The promise of data-driven approaches coupled with the growing availability of remote sensing data presents numerous opportunities for global monitoring of inland water bodies.

However, there are a number of challenges in analyzing remote sensing data at a global scale, e.g. the presence of heterogeneity in the data across space and time, and the high degree of noise and missing values. These challenges have restricted the application of existing water monitoring approaches to local and regional scales.

We are developing novel data analytics solutions that address these challenges to create a global monitoring system of surface water dynamics. As a first step towards this system, we have created a preliminary version of a web-viewer for visualizing changes occurring in water bodies: http://z.umn.edu/monitoringwater. This viewer is able to capture a variety of dynamics occurring in surface water, e.g. shrinking surface water bodies in Brazil and California due to droughts, expansion of glacial lakes in Tibet due to melting of glaciers, construction of dams and reservoirs at a global scale, and changes in river morphology such as river migration and delta erosion (see http://z.umn.edu/waterslides for further details). We envision this system to be a key enabler in detecting changes in surface water and understanding their relationships with climate change and human actions.**Plantation Mapping in Southeast Asia Using Remote Sensing Data**

Xiaowei Jia (University of Minnesota, Twin Cities)

Plantation mapping is critical for understanding and addressing deforestation, a key driver of climate change and ecosystem degradation. Unfortunately, most plantation maps are limited to small areas for specific years because they rely on visual inspection of imagery. Here we propose an automated approach for annual mapping of plantations using remote sensing data. Due to the heterogeneity of land cover classes, we propose a novel ensemble learning method that simultaneously uses training samples from multiple land cover classes over different years. After the ensemble learning, we further improve the performance by post-processing using a Hidden Markov Model. With the experiments on MODIS data, we demonstrate the superiority of the proposed method over multiple baselines. In addition, we conduct extensive validation by comparing the detected plantation by our approach with the existing datasets developed through visual interpretation by expert observers. Based on the random sampling and the comparison with high-resolution images, the precision (i.e. user’s accuracy) and recall (i.e. producer’s accuracy) of our generated map are around 85.53% and 81.51%, respectively, and the overall accuracy is 95.20%.**Learning with Imperfect Labels: Application to Monitoring Fires in Tropical Forests Using Satellite Data**

Varun Mithal (University of Minnesota, Twin Cities)

Many real-world problems involve learning predictive models for rare classes in settings where gold standard labels for training samples are unavailable but imperfect labels are available for all instances. We present RAre class Prediction in absence of True labels (RAPT), a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the RAPT framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using gold standard labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We applied the RAPT framework to generate a new burned area product for the tropical forests in South America and South-east Asia using publicly available satellite data: Moderate Resolution Imaging Spectroradiometer (MODIS) multispectral surface reflectance and Active Fire hotspots. The total burned area detected by RAPT in this region between the years 2000-2014 is 2,286,385 MODIS pixels (approximately 571 K sq. km.), which is more than three times compared to the estimates by the state-of-the art product from NASA: MODIS MCD64A1 (742,886 MODIS pixels). Our validation results, obtained using multiple lines of evidence, indicate that the events reported in our product are indeed true burn events that are missed by the state-of-art burned area products.**ConceFT: Concentration of Frequency and Time via a Multitapered Synchrosqueezed Transform**

Yi Wang (Syracuse University)

A new method is proposed to determine the time-frequency content of time-dependent signals consisting of multiple oscillatory components, with time-varying amplitudes and instantaneous frequencies. Numerical experiments as well as a theoretical analysis are presented to assess its effectiveness.**Consistency and Convergence Rate for Nearest Subspace Classifier**

Yi Wang (Syracuse University)

The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent and has rate of convergence O(1/sqrt(n)) under certain assumptions. Some simulations are presented eventually to verify the theoretical results.**Spatiotemporal Sampling in Discrete Evolution Processes**

Sui Tang (Johns Hopkins University)

Let $f \in \ell^2(I)$ be a signal at time $t = 0$ of a discrete evolution process controlled by a bounded linear operator A that produces the signals $Af, A^2f, \cdots $ at times $t=1,2,\cdots$. Let $Y =\{f(i), Af(i), \cdots, A^{l_i}f(i) : i \in \Omega \subset I\}$ be the spatiotemporal samples taken at various time levels. The problem under consideration is to find necessary and sufficient conditions on $A, \Omega, l_i$ in order to recover any $f \in \ell^2(I)$ from the measurements $Y$. This is the so called \textit{Dynamical Sampling Problem} in which we seek to recover a signal $f$ by combining coarse samples of $f$ and its futures states $A^lf$. Various version of dynamical sampling problems exhibit features similar to many fundamental problem such as frame theory, blind deconvolution, interpolation problem in Hardy space. In this poster, we will study the problem and show some recent results.**Learning Adaptive Multiscale Approximations to Data and Functions near Low-Dimensional Sets**

Wenjing Liao (Johns Hopkins University)

In the setting where a data set in $R^D$ consists of samples from a probability measure concentrated on or near an unknown low-dimensional manifold, we consider two sets of problems: geometric approximation of the manifold and regression of a function f on the manifold. In the first case, we construct multiscale low-dimensional empirical approximations of the manifold, which are adaptive when the manifold has geometric regularity that may vary at different locations and scales, and give performance guarantees. In the second case we exploit the empirical geometric approximations of the manifold to construct multiscale approximations of the function. We prove guarantees showing that we attain the same learning rates as if the function was defined on a Euclidean domain of the intrinsic dimension, instead of an unknown manifold.**Data Driven Partitioning for Distributed Learning**

Colin White (Carnegie Mellon University)

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points are often belonging to the same or similar classes, and more generally, classification rules of high accuracy tend to be locally simple but globally complex, we propose data dependent dispatching that takes advantage of such structures. Our main technical contribution is to provide algorithms with provable guarantees for data-dependent dispatching. We overcome novel challenges in combinatorial optimization to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically demonstrate that our method strongly scales and achieves significantly higher accuracy than random partitioning on both synthetic and real-world image and advertising datasets**A Representation Theory Perspective on Simultaneous Alignment and Classification with Applications in Cryo-EM**

Roy Lederman (Princeton University)

Single particle Cryo-electron microscopy (EM) recently joined X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy as a high-resolution structural method for biological macromolecules. Cryo-EM does not require crystallization necessary for X-ray crystallography, and unlike NMR it is not limited to small size molecules. In single particle Cryo-EM the 3-D structure is determined from many noisy 2-D projection images of individual, ideally identical frozen-hydrated macromolecules whose orientations and positions are random and unknown. Cryo-EM has been named Method of the Year 2015 by the journal Nature Methods after recent advancements in detector technology led to a breakthrough in near-atomic resolution reconstruction of molecules whose structure cannot be obtained by other techniques.

One of the great opportunities in Cryo-EM is studying heterogeneous samples, containing two or more distinct types or conformations of molecules. Taking advantage of this opportunity presents an algorithmic challenge: determining both the class and orientation of each particle; this often requires an initial guess of the structures and iterative estimations of the structures and particle orientations and class labels.

We propose an algorithm for simultaneous alignment and classification with the goal of simultaneously classifying Cryo-EM images and aligning them within their respective classes. Our proposed approach can also be extended to the case of continuous heterogeneity.**Multi-Dimensional Sublinear Sparse Fourier Algorithm**

Bosu Choi (Michigan State University)

We introduce the development of a sublinear sparse Fourier algorithm for high-dimensional data. In “Adaptive Sublinear Time Fourier Algorithm by D. Lawlor, Y. Wang and A. Christlieb (2013) , an efficient algorithm with empirically O(klogk) runtime and O(k) sampling complexity for the one-dimensional sparse FFT was developed for signals of bandwidth N, where k is the number of significant modes such that k << N.

In this work we develop an efficient algorithm for sparse FFT for higher dimensional signals, extending some of the ideas in the paper mentioned above. Note a higher dimensional signal can always be unwrapped into a one dimensional signal, but when the dimension gets larger unwrapping a higher dimensional signal into a one dimensional array is far too expensive to be realistic. Our approach here introduces two new concepts; “partial unwrapping and “tilting. These two ideas allow us to efficiently compute the sparse FFT of higher dimensional signals.**Stability and Convergence Tradeoff of Iterative Optimzation Algorihtms**

Yuansi Chen (University of California, Berkeley)

Algorithmic stability is known to provide generalization guarantees for algorithms. When applied to optimization algorithms, it offers an alternative understanding to their computational convergence rate. We show that algorithmic stability can be established for many popular first order optimization methods in the smooth convex setting. Furthermore, we reveal the fact that an optimization algorithm's stability trades off with its optimal convergence rate. Based on an information-theoretical argument, the stability upper bound determines the asymptotic convergence rate lower bound. Our analysis also provides stability based early-stopping criterion for large scale learning problems.