# Reception and Poster Session <br><small>Poster submissions invited from all participants</small>

Wednesday, October 26, 2011 - 4:15pm - 5:30pm

Lind 400

*On the von Luxburg Approximation for Hitting and Commute Times in Large Dense Digraphs*

Gyan Ranjan (University of Minnesota, Twin Cities)

We explore the von Luxburg approximation for hitting and commute times

in random walks between the vertices of a graph.

Recent studies have established, in particular, that the lengths of

random commutes, between a source-destination pair in large dense simple

undirected graphs, can be well approximated in terms of scaled degrees

of the end-point vertices.

These results implicate the relevance of commute times, or the lack of it,

as a distance in machine learning tasks.

Several corrected distances, based on commute times, have therefore been

proposed to rectify these shortcomings.

In this work, our principal contribution is to extend the von Luxburg

approximation to the general case of strongly connected directed graphs,

albeit in terms of the steady-state stationary probability distribution.

Our methodology does not depend on the interplay between undirected

graphs and electrical networks, an analogy on which previous studies rely

heavily and, more importantly, one that does not have a parallel for

directed graphs with which we deal in this work.

Therefore, in so doing, we develop a theoretical framework to further

generalize the von Luxburg approximations and establish conditions under

which these approximations holds for the general case.*Finding the ‘Needle’: Locating Interesting Nodes Using the K-Shortest Paths Algorithm in MapReduce*

J.T. Halbert (TexelTek)

Understanding how nodes interconnect in large graphs is an important problem in many ﬁelds. We wish to ﬁnd connecting nodes between two nodes or two groups of source nodes. In order to ﬁnd these connecting nodes in huge graphs, we have devised a highly parallelized variant of a k-shortest path algorithm that levies the power of the Hadoop distributed computing system and HBase distributed key/value store. We show how our system enables previously unobtainable graph analysis by ﬁnding these connecting nodes in graphs as large as one billion nodes or more on modest commodity hardware in a time frame of just minutes.*Cops and Robbers on Geometric Graphs*

Andrew Beveridge (Macalester College)

In the game of cops and robbers, one robber is pursued by a set of

cops on a graph G. In each round, these agents move between vertices

along the edges of the graph. The cop number c(G) denotes the minimum

number of cops required to catch the robber in finite time. We study

the cop number of geometric graphs. We prove that c(G) connected geometric graph G in R2. We improve on this bound for

random geometric graphs that are sufficiently dense, giving conditions

where one or two cops have a winning strategy.*Simultaneous clustering of time-evolving graphs*

Lars Eldén (Linköping University)

Spectral partitioning is a wellknown method for the clustering and

partitioning of graphs. The Fiedler vector of the graph Laplacian is used

to reorder the vertices of the graph. This makes it possible to find a

partitioning that is rather close to being optimal. Alternatively the

Fiedler vector can be computed using the adjacency matrix of the graph. We

discuss the generalization of this method to tensors, which made up from a

graph that evolves with time. Our approach is based on a generalization of

the Rayleigh quotient characterization of the Fiedler vector. The solution

of the tensor Rayleigh quotient maximization problem is computed using a

tensor-Krylov method. A couple of examples are given, with a clustering of

a sparse tensor obtained from the Enron email data set.*Sparsistency of the Edge Lasso over Graph*

James Sharpnack (Carnegie-Mellon University)

The fused lasso was proposed recently to enable recovery of high-dimensional patterns which are piece-wise constant on a graph, by penalizing the L1-norm of differences of measurements at nodes that share an edge. While there have been some attempts at coming up with efficient algorithms for solving the fused lasso optimization, a theoretical analysis of its performance is mostly lacking except for the simple linear graph topology. In this paper, we investigate sparsistency of fused lasso for general graph structures, i.e. its ability to correctly recover the exact support of piece-wise constant graph-structured patterns asymptotically (for large-scale graphs). To emphasize this distinction over previous work, we will refer to it as Edge Lasso. We focus on the (structured) normal means setting, and our results provide necessary and sufficient conditions on the graph properties as well as the signal-to-noise ratio needed to ensure sparsistency. We examplify our results using simple graph-structured patterns, and demonstrate that in some cases fused lasso is sparsistent at very weak signal-to-noise ratios. In other cases, it performs no better than thresholding the difference of measurements at nodes which share an edge.*Simulating complex networks via algorithms based on (t,r)-regular graphs*

Debra Knisley (East Tennessee State University)

The concept of degree regularity in a graph can be generalized by considering the cardinality of the neighborhood union of a set of vertices of size t. Since simple graphs, the most frequently used models, do not permit self-adjacency, we require the set of vertices of order t to be an independent (edgeless) set. For positive integers t and r , a graph is defined to be (t,r) –regular if every independent set of vertices of order t is collectively adjacent to exactly r vertices in the graph. If t > 1 and r >1 are fixed and the graph is sufficiently large, these graphs exhibit small-world properties, especially for small t. In this work we investigate the spectrum of large (2,r)-regular graphs and we define several algorithms that produce a small-world and/or scale-free graph by systematically rewiring a random graph until it is “almost” a (t,r)-regular graph for small t.*Large Graph Classification: Theory and Statistical Connectomics Applications*

Joshua Vogelstein (Johns Hopkins University)

The field of pattern recognition (classification) is dominated by vector-valued data, in terms of both theory and applications. Here, we develop novel classifiers that natively operate in graph-valued domains; specifically, we develop both parametric and nonparametric classifiers. The parameter classifier, which we dub the “Signal Subgraph Classifier”, has the advantage of consistently yielding an a selection of vertices/edges which are deemed informative with regard to the classification task at hand, under and independent edge model (and more generally). The nonparametric classifier, dubbed the “Graph-Matched Frobenius Norm k-Nearest-Neighbor Classifier”, is proven to be universally consistent, both when vertices are labeled and when they are not. This proof, however, relies on a perfect graph matching algorithm, and graph matching is known to be NP-hard. So, we developed a novel fast inexact graph matching algorithm, based on relaxing the constraint set to its convex hull. We prove that the optimal solution to this relaxation is identical to the original problem’s optimal solution whenever the graphs are undirected and isomorphic, even though our algorithm runs in O(n^3) time. All three algorithms outperform the previous state-of-the-art in their respective domains. For future work, we discuss the possibility of using “Dot Product Embedding” to classifier even larger graphs.*Dot Product Embedding in Large (Errorfully Observed) Graphs with Applications in Statistical Connectomics*

Daniel Sussman (Johns Hopkins University)

A stochastic block model is a kind of heterogeneous Erdos-Renyi random graph model where the probability of an edge between any pair of vertices is only a function of the cluster to which those vertices belong. This is equivalent to a conditionally independent edge random graph model, where each vertex has associated with it one of finitely many latent vectors (one per cluster). We consider a special case of the conditionally independent edge random graph model called the random dot product graph, in which the dot products of the latent vectors associated with each vertex are the edge probabilities. We prove that a dot product embedding (DPE) is sufficient to correctly assign all but a neglible faction of the vertices to their respective clusters. We apply DPE in both an unsupervised and a semi-supervised numerical experiments, demonstrating efficacy in both domains. Moreover, we consider that perhaps, graphs are only observed errorfully. Specifically, there is a quantity/quality trade-off: given a fixed amount of time, one could more accurately sample fewer edges, or less accurately sample more edges. We demonstrate that we can numerically estimate the optimal trade-off for a few specific exploitation tasks.*A Centrality Measure for Directed Graphs Using Average Hitting Cost*

Golshan Golnari (University of Minnesota, Twin Cities)

There are many relations and formulas derived for undirected graphs in the literature while there is much less analysis on directed graphs, although many problems can only be represented by directed graphs.

In this work, using a special asymmetric Laplacian matrix introduced in a previous work, we present a unified formula for “the number of passages through nodes” which is used in our work to measure the centrality of a node in a network. Moreover, we extend the definition of Hitting/Commute cost for undirected graphs to directed graphs and show that commute cost has an interesting relation with commute time. This result is useful in modifying a network and reducing the overall cost.*Online Subspace Estimation and Tracking from Incomplete and Corrupted Data*

Laura Balzano (University of Wisconsin, Madison)

Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for Matrix Completion and Robust PCA have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.*Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs*

Hyokun Yun (Purdue University)

We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and quilt them together. Under mild technical conditions our algorithm runs in O(log_2(n)^3 E) time, where n is the number of nodes and E is the number of edges in the sampled graph. We demonstrate the scalability of our algorithm via extensive empirical evaluation; we can sample a MAGM graph with 8 million nodes and 20 billion edges in under 6 hours.*2-cancellative families and codes*

Zoltan Furedi (Hungarian Academy of Sciences (MTA))

There are many instances in Coding Theory when codewords must be restored

from partial information like defected data (error correcting codes) or

some superposition of the strings (these can lead to Sidon sets, sum-free

sets, etc). A family of sets F (and the corresponding family of 0-1

vectors) is called cancellative if A and A\cup B determine B

(in case of A, B\in F). Equivalently, for all A,B,C\in F we have

A\cup B=A\cup C \imply B=C.

A family of sets F (and the corresponding code of 0-1 vectors) is

called t-cancellative if for all distict t+2 members A_1,..., A_t

and B,C in F the union of A_1, ... A_t and B is different from the

union of A_1, ..., A_t and C. For c_1(n) it is know that it equals to

(1.5)^{n+o(n)}.

Let c_t(n) be the size of the largest t-cancellative code on n elements.

We significantly improve the previous upper bounds of Korner and Sinaimeri,

e.g., we show

c_2(n) n_0).

We also consider c_2(n,k), the size of the largest 2-cancellative

k-uniform family on n vertices. This is in fact, a Turan type problem,

and has many connections to other well-known questions like the

Ruzsa-Szemeredi (6,3)-theorem. Using an algebraic construction we show

that

c_2(n,2k)=\Theta (n^k)

for each k when n tends to infinity.