# Poster Session and Reception

Tuesday, May 19, 2015 - 4:30pm - 6:00pm

Lind 400

**Optimal Detection of Random Walks on Graphs: Performance Analysis via Statistical Physics**

Yue Lu (Harvard University)

We study the problem of detecting a random walk on a graph from a sequence of noisy measurements at every node. There are two hypotheses: either every observation is just meaningless zero-mean Gaussian noise, or at each time step exactly one node has an elevated mean, with its location following a random walk on the graph over time. We want to exploit knowledge of the graph structure and random walk parameters (specified by a Markov chain transition matrix) to detect a possibly very weak signal. The optimal detector is easily derived, and we focus on the harder problem of characterizing its performance through the (type-II) error exponent: the decay rate of the miss probability under a false alarm constraint. The expression for the error exponent resembles the free energy of a spin glass in statistical physics, and we borrow techniques from that field to develop a lower bound. Our fully rigorous analysis uses large deviations theory to show that the lower bound exhibits a phase transition: strong performance is only guaranteed when the signal-to-noise ratio exceeds twice the entropy rate of the random walk. Monte Carlo simulations show that the lower bound fully captures the behavior of the true exponent.**Large Submatrix Detection in Gaussian Random Matrices**

Quan Li (Massachusetts Institute of Technology)

Iterative Search Algorithm is a widely applicable and computationally efficient approach for finding large average submatrices of a real-valued matrix in the exploratory analysis of high dimensional data. It alternately updates rows and columns until no further improvement can be obtained in terms of the average of the $k\times k$ submatrix. However, there is no theoretical understanding of when this algorithm converges to a global maximum.

We present first theoretical analysis of the performance of the Iterative Search Algorithm for finding large average $k \times k$ submatrix in Gaussian random $n \times n$ matrices. We show that for all $\epsilon \in (0,1)$ there exists a constant integer $N(\epsilon)$ such that the Iterative Search Algorithm terminates within $N(\epsilon)$ iterations with probability at least $1-\epsilon$ as $n \rightarrow \infty$. This result implies that the Iterative Search Algorithm converges to a local maximum submatrix w.h.p.. More specifically, recalling that the average of the global maximum submatrix is $\sqrt{4 \log n / k}(1+o(1))$, the average of the $k\times k$ submatrix the Iterative Search Algorithm converges to is $\sqrt{2\log n / k}(1+o(1))$ w.h.p., thus leading to a constant factor gap.**Tighter Low-rank Approximation via Sampling the Leveraged Element**

Srinadh Bhojanapalli (The University of Texas at Austin)

In this work, we propose a new randomized algorithm for computing a low-rank approximation to a given matrix. Taking an approach different from existing literature, our method first involves a specific biased sampling, with an element being chosen based on the leverage scores of its row and column, and then involves weighted alternating minimization over the factored form of the intended low-rank matrix, to minimize error only on these samples. Our method can leverage input sparsity, yet produce approximations in {\em spectral} (as opposed to the weaker Frobenius) norm; this combines the best aspects of otherwise disparate current results.

This approach also leads to a new method to directly compute a low-rank approximation (in efficient factored form) to the product of two given matrices; it computes a small random set of entries of the product, and then executes weighted alternating minimization (as before) on these. The sampling strategy is different because now we cannot access leverage scores of the product matrix (but instead have to work with input matrices).**Population Genetics of Evolution: The Role of Mutations and Change of Environment**

Ioannis Panageas (Georgia Institute of Technology)

We introduce a nonlinear model of sexual evolution and bound its convergence time. Our model is produced by incorporating the possibility of shocks that perturb the current system state to a classic model of sexual haploid evolution. Finally, we examine the role of mutations and the change of environment in the survival of haploid populations.**Collaborative Filtering with Low Regret**

Luis Voloch (Massachusetts Institute of Technology)

The goal of a recommendation system is to recommend an item to a user that s/he may like without prior knowledge of it. The so called collaborative filtering algorithms have been corner-stone of most practical recommendation systems, and there are two main paradigms in neighborhood-based collaborative filtering: user-user and item-item. Under user-user paradigm, a user is recommended an item by finding what items are liked by similar users; in contrast, items similar to those liked by the user are found and subsequently recommended under item-item paradigm. Much empirical evidence exists that the item-item paradigm works better in many cases [Linden et al. 2003, Koren and Bell 2011].

In this work, motivated to explain this empirical observation as well as provide framework to design and analyze various recommendation algorithms, we develop a generic model for online recommendations. Under this model, we provide evidence for why item-item paradigm can achieve fundamentally better performance compared to user-user paradigm. As a byproduct, we provide an algorithm that resolves the so called cold start problem faced by most recommendation systems, that achieves a sublinear regret whose exponent depends on the doubling dimension of the space of items.

We shall discuss similarities and differences with prior works on recommendation algorithms, various bandit problems and decision making with mixture distributions.

This is joint work with Guy Bresler and Devavrat Shah.**Rigidity of a Regular Stochastic Block Model**

Gerandy Brito (University of Washington)

The stochastic block model is a paradigm for studying community detection in random networks. Its clustering properties have been extensively studied in the statistics, physics and computer science literature. In its simplest form, the SBM is a random graph model which divides the vertices into two classes of the same size. An edge is added between two vertices with probability p if they fall in the same class and with probability q if not. One natural question to ask is whether we can recover the labels of the vertices (i.e. the classes assignment) by only seeing the underlying graphs.

Here we introduce a regular stochastic block model and discuss some of its properties. We show how the rigidity of the model allows us to prove stronger results to the ones existing for the classical SBM. In particular, it is proven that recovering most of the labels correctly is equivalent to recover all of them.

Joint work with Ioana Dumitriu, Shirshendu Ganguly, Christopher Hoffman and Linh V. Tran.**Reconstruction of Spreading Couplings from Partial Observations with a Dynamic**

Message-passing Algorithm

Andrey Lokhov (Los Alamos National Laboratory)

The problem of network reconstruction and parameters estimation from the data has attracted a considerable attention in the past several years. Although the static setting has been extensively explored, the inference of the diffusion network and spreading couplings have been less studied so far. A number of recent papers introduced efficient algorithms for the dynamic case, based on the maximization of the likelihood of observed cascades, assuming that the full information for all the nodes in the network is available. In this work, we focus on a more realistic and restricted scenario, in which only a partial information on the cascades is available: either the set of activation times for a limited number of nodes, or the states of nodes for a subset of observation times, or even a mixture of both variants. New techniques should be introduced in this case of incomplete observations in order to extract the relevant information from the correlations. We show that a robust reconstruction of spreading parameters can be achieved with an algorithm based on recently introduced dynamic message-passing equations for the spreading processes. The suggested approach is promising for diverse applications, such as design of artificial networks with desired properties or model estimation in the case where the structure of the network can not be directly accessed.