October 24 - 28, 2011
Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for "Matrix Completion" and "Robust PCA" have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.
In the game of cops and robbers, one robber is pursued by a set of
cops on a graph G. In each round, these agents move between vertices
along the edges of the graph. The cop number c(G) denotes the minimum
number of cops required to catch the robber in finite time. We study
the cop number of geometric graphs. We prove that c(G)
Directed graphs are better than undirected graphs in capturing
the connectivity in many applications like the WWW and networks of
wireless devices. For undirected graphs, it is well known how to
obtain the expected inter-vertex average commute times from the graph
Laplacian matrix. We show the same formulas hold in the case of strongly
connected directed graphs. Our result is obtained by deriving a close
relation between the Laplacian with a particular scaling and the so-called
Fundamental matrix for an associated random walk over the graph. We find
that the commute times still form a metric and give bounds in terms of
the stationary probabilities for the random walk. Using a simple model
of wireless devices, we observe that these commute times can differ from
those obtained from a previously proposed symmetrized Laplacian derived
from a related weighted undirected graph.
Keywords of the presentation: compressed sensing, wireless networks, coding theory
We consider a framework for full-duplex communication in ad-hoc wireless networks recently proposed by Dongning Guo. An individual node in the wireless network either transmits or it listens to transmissions from other nodes but it cannot to both at the same time. There might be as many nodes as there are 48 bit MAC addresses but we assume that only a small subset of nodes contribute to the superposition received at any given node in the network.
We use ideas from compressed sensing to show that simultaneous communication is possible across the entire network. Our approach is to manage interference through configuration rather than to eliminate or align it through extensive exchange of fine-grained Channel State Information. Our approach to configuration makes use of old fashioned coding theory.
Spectral partitioning is a wellknown method for the clustering and
partitioning of graphs. The Fiedler vector of the graph Laplacian is used
to reorder the vertices of the graph. This makes it possible to find a
partitioning that is rather close to being optimal. Alternatively the
Fiedler vector can be computed using the adjacency matrix of the graph. We
discuss the generalization of this method to tensors, which made up from a
graph that evolves with time. Our approach is based on a generalization of
the Rayleigh quotient characterization of the Fiedler vector. The solution
of the tensor Rayleigh quotient maximization problem is computed using a
tensor-Krylov method. A couple of examples are given, with a clustering of
a sparse tensor obtained from the Enron email data set.
Graphs are commonly used to represent complex networks, such as the internet or
biological systems. The structure of a graph can be inferred by clustering
vertices based on dissimilarity measures correlated with the underlying
graph-distances. For example, internet hosts can be clustered by measured
latencies or traffic correlations and genes can be clustered according to
responses to varied experimental conditions. These clusters reveal the
structure of the underlying graph; e.g., internet routing or functional
pathways in biological systems. This tutorial talk will discuss theory,
methods, and applications of graph-learning based on clustering. Learning with
missing or incomplete data, which is the norm in practice, will be a key theme
in the discussion. The theory draws from ideas in statistical learning,
spectral and subspace clustering, and matrix completion. The main concepts
will be illustrated with examples from communication and biological network
inference problems.
There are many instances in Coding Theory when codewords must be restored
from partial information like defected data (error correcting codes) or
some superposition of the strings (these can lead to Sidon sets, sum-free
sets, etc). A family of sets F (and the corresponding family of 0-1
vectors) is called cancellative if A and Acup B determine B
(in case of A, Bin F). Equivalently, for all A,B,Cin F we have
Acup B=Acup C imply B=C.
A family of sets F (and the corresponding code of 0-1 vectors) is
called t-cancellative if for all distict t+2 members A_1,..., A_t
and B,C in F the union of A_1, ... A_t and B is different from the
union of A_1, ..., A_t and C. For c_1(n) it is know that it equals to
(1.5)^{n+o(n)}.
Let c_t(n) be the size of the largest t-cancellative code on n elements.
We significantly improve the previous upper bounds of Korner and Sinaimeri,
e.g., we show
c_2(n)< 2^0.322n (for n> n_0).
We also consider c_2(n,k), the size of the largest 2-cancellative
k-uniform family on n vertices. This is in fact, a Turan type problem,
and has many connections to other well-known questions like the
Ruzsa-Szemeredi (6,3)-theorem. Using an algebraic construction we show
that
c_2(n,2k)=Theta (n^k)
for each k when n tends to infinity.
Keywords of the presentation: Bayesian networks, statistical models, Lego, linear programming, compositional models
Learning high-dimensional probability distributions with a very
reduced number of samples is no more difficult than with a great
many. However, arranging for such models to generalize well in the
small-sample domain is hard. Our approach is motivated by
compositional models and Bayesian networks, and designed to adapt to
sample size. We start with a large, overlapping set of elementary
statistical building blocks, or "primitives", which are
low-dimensional marginal distributions learned from data. Subsets of
primitives are combined in a lego-like fashion to construct a
probabilistic graphical model. Model complexity is controlled by
adapting the primitives to the amount of training data and imposing
strong restrictions on merging them into allowable compositions. In
the case of binary forests, structure optimization corresponds to an
integer linear program and the maximizing composition can be computed
for reasonably large numbers of variables.
The spiking dynamics of simultaneously recorded neurons from a small region of cortex reflect the local network structure of excitatory and inhibitory connections between observed neurons, as well as the time varying response of the neurons to their many unobserved and correlated inputs. Inference about the local network is easily contaminated by these unobserved nonstationary influences. We have been exploring conditional inference as an approach for statistically isolating local network dynamics from background nonstationarities. The approach is promising, but there remain many modeling questions and computational challenges.
There are many relations and formulas derived for undirected graphs in the literature while there is much less analysis on directed graphs, although many problems can only be represented by directed graphs.
In this work, using a special asymmetric Laplacian matrix introduced in a previous work, we present a unified formula for “the number of passages through nodes” which is used in our work to measure the centrality of a node in a network. Moreover, we extend the definition of Hitting/Commute cost for undirected graphs to directed graphs and show that commute cost has an interesting relation with commute time. This result is useful in modifying a network and reducing the overall cost.
Keywords of the presentation: image analysis, shape analysis, feature extraction, maps, map networks, spectral graph theory
In this talk we consider certain graphs that arise out of matching shapes or images. We begin by exploring how to define isometrically-invariant descriptors of shape neighborhoods and how these can be used to compute maps between shapes. We then analyze map networks with the goal of improving individual maps connecting the shapes. The fact that maps compose algebraically allows certain novel approaches that are not possible when edges encode only some proximity or similarity of the nodes. Finally we look at much larger networks of images connected via matching feature sets and look at how methods of spectral graph theory and algebraic topology can help both the construction and the simplification of such networks.
Understanding how nodes interconnect in large graphs is an important problem in many ﬁelds. We wish to ﬁnd connecting nodes between two nodes or two groups of source nodes. In order to ﬁnd these connecting nodes in huge graphs, we have devised a highly parallelized variant of a k-shortest path algorithm that levies the power of the Hadoop distributed computing system and HBase distributed key/value store. We show how our system enables previously unobtainable graph analysis by ﬁnding these connecting nodes in graphs as large as one billion nodes or more on modest commodity hardware in a time frame of just minutes.
The concept of degree regularity in a graph can be generalized by considering the cardinality of the neighborhood union of a set of vertices of size t. Since simple graphs, the most frequently used models, do not permit self-adjacency, we require the set of vertices of order t to be an independent (edgeless) set. For positive integers t and r , a graph is defined to be (t,r) –regular if every independent set of vertices of order t is collectively adjacent to exactly r vertices in the graph. If t > 1 and r >1 are fixed and the graph is sufficiently large, these graphs exhibit small-world properties, especially for small t. In this work we investigate the spectrum of large (2,r)-regular graphs and we define several algorithms that produce a small-world and/or scale-free graph by systematically rewiring a random graph until it is “almost” a (t,r)-regular graph for small t.
Keywords of the presentation: Graph Model
Large-scale graph analysis is becoming increasingly prevalent in our quest to understand the massive amounts of data arising from computer communications, social networks, email messages, purchase records, financial transactions, etc. In general, however, large-scale data sets are not available for study because they may contain private data or may relinquish some competitive advantage. Consequently, most researchers must rely on graph models as a proxy for real data. Moreover, even when real data is available, graph models may be utilized to test methodologies at varying scales or under a range of scenarios. Modeling real-world large-scale graphs, at scales upwards of a trillion nodes as proposed by Graph500, is the ultimate objective. In order to scale to large sizes, we require a model whose edges can be generated independently and thus in parallel. On the other hand, large-scale graphs contain many small-scale and even minuscule-scale interactions, generically referred to as community structure, that should be accurately represented in any graph model. Existing graph models either fail to generate edges independently or fail to capture community structure. We present analysis of the Stochastic Kronecker Graph (SKG) model, which is the current generator for the Graph500 benchmark, including a comparison to the Chung-Lu model (also known the edge configuration model). We propose a new approach, the Block Two-Level Erdös-Rényi (BTER) model which naturally contains community structure, even at the minuscule scale, but is able to generate edges independently and thus can scale to trillions of nodes.
Read More...
Motivated by talks from the first day of this workshop, we discuss in more detail modelling data by multiple subspaces, a.k.a., subspace clustering. We emphasize various theoretical results supporting the performance of some of these algorithms. In particular, we study in depth the minimizer obtained by a common energy minimization and its robustness to noise and outliers. We relate this work to recent works on robust PCA. We demonstrate how such theoretical insights guide us in practical choices and discuss some important practical questions raised in early talks of this workshop: choosing optimal local neighborhoods, modeling with missing data or high corruption and practical applications.
This is based on joint works with Arthur Szlam, Yi Wang and Teng Zhang.
Keywords of the presentation: sampling and inference from non-ignorable sampling designs; network parametric and nonparametric modeling; estimation of latent processes on a network; applications to social, biological and information networks
A number of scientific endeavors of national and international interest today involve populations with interacting and/or interfering units.
In these problems, a collection of partial measurements about patterns of interaction and interference (e.g., social structure and familial relations) is available, in addition to the more traditional measurements about unit-level outcomes and covariates. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on networks date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of networks, structured measurements and interference.
In this tutorial, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on inference and other core statistical issues. Topics include elements of sampling and inference from non-ignorable (network sampling) designs, parametric and nonparametric modeling, and estimation of latent processes on a network, with hints to the applications to social, biological and information networks that motivate these statistical problems.
Time series of graphs arise in many branches of sciences and applications. We discuss novel techniques for measuring distances between graphs, based on multiscale analysis of random walks, in a way that is both sensitive to localized changes and robust to ``noisy'' perturbations of the graph. From this we derive a framework for analyzing time series of graphs, together with fast algorithms. Finally, we discuss applications to families of graphs with multiscale structure, as well as real data mapped to dynamic graphs.
Recent empirical work has demonstrated that, although there often exists
meaningful "small scale" structure (e.g., clustering structure around a
single individual at the size-scale of roughly 100 individuals) in large
social and information networks, analogous "large scale" structure (e.g.,
meaningful or statistically significant properties of tens or hundreds of
thousands of individuals) either is lacking entirely or is of a form that
is extremely difficult for traditional machine learning and data analysis
tools to identify reliably. For example, there are often small clusters
which provide a "bottleneck" to diffusions (e.g., diffusive-based dynamic
processes of the form of interest in viral marketing applications and
tipping point models of network dynamics); on the other hand, there are
typically no large clusters that have analogous bottlenecks, and thus
diffusion-based metrics (and the associated machine learning and data
analysis tools) are simply much less meaningful (or discriminative or
useful) if one is interested in analyzing the network at large sizes.
This empirical work will be reviewed in some detail; its implications for
extracting insight from large networks with popular machine learning and
data analysis tools will be discussed; and several examples of novel
machine learning and data analysis tools that were developed in response
to these observations will be discussed.
We explore the von Luxburg approximation for hitting and commute times
in random walks between the vertices of a graph.
Recent studies have established, in particular, that the lengths of
random commutes, between a source-destination pair in large dense simple
undirected graphs, can be well approximated in terms of scaled degrees
of the end-point vertices.
These results implicate the relevance of commute times, or the lack of it,
as a distance in machine learning tasks.
Several corrected distances, based on commute times, have therefore been
proposed to rectify these shortcomings.
In this work, our principal contribution is to extend the von Luxburg
approximation to the general case of strongly connected directed graphs,
albeit in terms of the steady-state stationary probability distribution.
Our methodology does not depend on the interplay between undirected
graphs and electrical networks, an analogy on which previous studies rely
heavily and, more importantly, one that does not have a parallel for
directed graphs with which we deal in this work.
Therefore, in so doing, we develop a theoretical framework to further
generalize the von Luxburg approximations and establish conditions under
which these approximations holds for the general case.
Keywords of the presentation: Sparse Signal Processing, Network tomography, Sample Complexity, Random Walks
Sparse signal processing on graphs arises in several applications including IP networks, wireless sensor networks (WSN) and infection propagation. For instance, in IP and WSNs a key problem is to identify a sparse subset of congested links and/or failing sensor nodes from path measurements. We develop sample complexity bounds for the number of path measurements required to recover such sparse subsets associated with the graph. To develop sample complexity scaling bounds we consider random path measurements associated with random walks on large graphs. Our main result is that sample complexity, namely, the number of path measurements required for sparse signal processing on expander-type graphs with path measurements is similar to that required when no such constraints---as in compressive sensing with random projections --- are imposed on the measurements.
In this talk I will describe recent results on large populations of brain connectivity networks. We analyze sex and kinship relations and their effects in brain networks metrics and topologies. The reported results are obtained in collaboration between the team of Paul Thompson at UCLA and my team at the UofM, in particular Neda Jahanshad and Julio Duarte-Carvajalino.
Keywords of the presentation: link prediction, nonparametric time series, consistency,
We propose a non-parametric link prediction algorithm for a sequence of graph snapshots over time. The model predicts links based on the features of its endpoints, as well as those of the local neighborhood around the endpoints. This allows for different types of neighborhoods in a graph, each with its own dynamics (e.g, growing or shrinking communities). We prove the consistency
of our estimator, and give a fast implementation based on locality-sensitive hashing. Experiments with simulated as well as three real-world dynamic graphs show that we outperform the state of the art, especially when sharp fluctuations or non-linearities are present in the graph sequence.
The fused lasso was proposed recently to enable recovery of high-dimensional patterns which are piece-wise constant on a graph, by penalizing the L1-norm of differences of measurements at nodes that share an edge. While there have been some attempts at coming up with efficient algorithms for solving the fused lasso optimization, a theoretical analysis of its performance is mostly lacking except for the simple linear graph topology. In this paper, we investigate sparsistency of fused lasso for general graph structures, i.e. its ability to correctly recover the exact support of piece-wise constant graph-structured patterns asymptotically (for large-scale graphs). To emphasize this distinction over previous work, we will refer to it as Edge Lasso. We focus on the (structured) normal means setting, and our results provide necessary and sufficient conditions on the graph properties as well as the signal-to-noise ratio needed to ensure sparsistency. We examplify our results using simple graph-structured patterns, and demonstrate that in some cases fused lasso is sparsistent at very weak signal-to-noise ratios. In other cases, it performs no better than thresholding the difference of measurements at nodes which share an edge.
Keywords of the presentation: spectral clustering, minimax analysis, noise tolerance, active clustering, biclustering
Despite the empirical success of spectral clustering, its performance under noise and incomplete data is not well understood. This talk will provide a precise characterization of the misclustering rate of spectral clustering for large graphs. Using minimax theory, we establish information theoretic lower bounds on the amount of noise any clustering algorithm can tolerate and demonstrate that spectral clustering is near-optimal. The gap relative to the optimal performance is the statistical price that needs to be paid for computational efficiency. Using this analysis, we propose a novel spectral method which optimizes the number of edge similarities that need to be measured to guarantee consistent recovery of the clusters.
In high dimensional settings, the clusters can be very small relative to the size of the graph and are often characterized by a few relevant features. To enable simultaneous extraction of such a cluster and the relevant features, we consider a sparsity penalized spectral biclustering method and compare its performance to minimax limits.
Keywords of the presentation: tree-like, intermediate structure, hyperbolicity, tree decomposition
Large complex networks naturally represent relationships in a variety of settings, e.g. social interactions, computer/communication networks, and genomic sequences. A significant challenge in analyzing these networks has been understanding the “intermediate structure” – those properties not captured by metrics which are local (e.g. clustering coefficient) or global (e.g. degree distribution). It is often this structure which governs the dynamic evolution of the network and behavior of diffusion-like processes on it. Although there is a large body of empirical evidence suggesting that complex networks are often “tree-like” at intermediate to large size-scales (e.g. work of Boguna et al in physics, Kleinberg on internet routing, and Chung & Lu on power-law graphs), it remains a challenge to take algorithmic advantage of this structure in data analysis. We discuss several approaches and heuristics for quantifying and elucidating tree-like structure in networks, including various tree-decompositions and Gromov's delta-hyperbolicity. These approaches were developed with very different "tree-like" applications in mind, and thus we discuss the strengths and short-comings of each in the context of complex networks and how each might aid in identifying intermediate-scale structure in these graphs.
A networked marketplace like eBay where multiple buyers and sellers participate at the scale of hundreds of millions, interesting network structures exist. These these structures in buying-selling relationship between buyers and sellers influence any typical algorithm on site like search, recommender systems, and reputation computation. Graph structures also exists in query relationship space. We discuss these structures and use of these structures in design of algorithms in search, recommender systems, and trust systems.
A stochastic block model is a kind of heterogeneous Erdos-Renyi random graph model where the probability of an edge between any pair of vertices is only a function of the cluster to which those vertices belong. This is equivalent to a conditionally independent edge random graph model, where each vertex has associated with it one of finitely many latent vectors (one per cluster). We consider a special case of the conditionally independent edge random graph model called the random dot product graph, in which the dot products of the latent vectors associated with each vertex are the edge probabilities. We prove that a dot product embedding (DPE) is sufficient to correctly assign all but a neglible faction of the vertices to their respective clusters. We apply DPE in both an unsupervised and a semi-supervised numerical experiments, demonstrating efficacy in both domains. Moreover, we consider that perhaps, graphs are only observed errorfully. Specifically, there is a quantity/quality trade-off: given a fixed amount of time, one could more accurately sample fewer edges, or less accurately sample more edges. We demonstrate that we can numerically estimate the optimal trade-off for a few specific exploitation tasks.
The field of pattern recognition (classification) is dominated by vector-valued data, in terms of both theory and applications. Here, we develop novel classifiers that natively operate in graph-valued domains; specifically, we develop both parametric and nonparametric classifiers. The parameter classifier, which we dub the “Signal Subgraph Classifier”, has the advantage of consistently yielding an a selection of vertices/edges which are deemed informative with regard to the classification task at hand, under and independent edge model (and more generally). The nonparametric classifier, dubbed the “Graph-Matched Frobenius Norm k-Nearest-Neighbor Classifier”, is proven to be universally consistent, both when vertices are labeled and when they are not. This proof, however, relies on a perfect graph matching algorithm, and graph matching is known to be NP-hard. So, we developed a novel fast inexact graph matching algorithm, based on relaxing the constraint set to its convex hull. We prove that the optimal solution to this relaxation is identical to the original problem’s optimal solution whenever the graphs are undirected and isomorphic, even though our algorithm runs in O(n^3) time. All three algorithms outperform the previous state-of-the-art in their respective domains. For future work, we discuss the possibility of using “Dot Product Embedding” to classifier even larger graphs.
Online optimization methods are useful in a variety of applications with sequential observations of a dynamic environment. Often such methods are designed to minimize an accumulated loss metric, and the analysis techniques are appealing because of their applicability in settings where observations cannot be considered independent or identically distributed, and accurate knowledge of the environmental dynamics cannot be assumed. However, such analyses may mask the role of regularization and adaptivity to environmental changes. This work explores regularized online optimization methods and presents several novel performance bounds. Tracking regret bounds relate the accumulated loss of such an algorithm with that of the best possible dynamic estimate that could be chosen in a batch setting, and risk bounds quantify the roles of both the regularizer and the variability of the (unknown) dynamic environment. The efficacy of the method is demonstrated in an online Ising model selection context applied to U. S. Senate voting data.
We describe the first sub-quadratic sampling algorithm for the Multiplicative Attribute Graph Model (MAGM) of Kim and Leskovec (2010). We exploit the close connection between MAGM and the Kronecker Product Graph Model (KPGM) of Leskovec et al. (2010), and show that to sample a graph from a MAGM it suffices to sample small number of KPGM graphs and quilt them together. Under mild technical conditions our algorithm runs in O(log_2(n)^3 |E|) time, where n is the number of nodes and |E| is the number of edges in the sampled graph. We demonstrate the scalability of our algorithm via extensive empirical evaluation; we can sample a MAGM graph with 8 million nodes and 20 billion edges in under 6 hours.