# Reception and Poster Session

Wednesday, September 28, 2011 - 4:15pm - 5:30pm

Lind 400

*Entropy and Convex Geometry in High Dimensions*

Sergey Bobkov (University of Minnesota, Twin Cities)Mokshay Madiman (Yale University)

We develop an information-theoretic perspective on some questions in high-dimensional convex geometry. First, we describe a new equipartition property for log-concave probability measures, which is a precise formulation of the intuition that log-concave distributions are effectively uniformly distributed on compact sets. Second, we develop some Gaussian comparison results for log-concave measures, including some entropic formulations of the hyperplane conjecture. Third, we establish a new reverse entropy power inequality for convex measures analogous to V. Milman's reverse Brunn-Minkowski inequality.*Noise thresholds for spectral clustering and biclustering in high dimensions*

Aarti Singh (Carnegie-Mellon University)

Despite the empirical success of spectral clustering, its performance under noise and in high dimensions is not well understood. We provide a precise characterization of the misclustering rate of spectral clustering under noisy similarities. Using minimax theory, we establish information theoretic lower bounds on the amount of noise any clustering algorithm can tolerate and demonstrate that spectral clustering is near-optimal. The gap relative to the optimal performance is the statistical price for computational efficiency gained by solving a relaxation of the minimum ratio-cut problem. Additionally, in high dimensional settings, the clusters can be very small and are often characterized by only a few relevant features. To enable simultaneous extraction of such a cluster and the relevant features, we consider a sparsity penalized spectral biclustering method and compare its performance to minimax limits.*Extremal Lipschitz functions in the deviation from the mean inequalities*

Dainius Dzindzalieta (Vilnius State University)

We consider a deviation from the mean in the problem

\begin{equation}

D(x)\=\sup_{f\in \F}\mu\{f-\E_{\mu} f\geq x\},\qquad\ \text{for}\

x\in\R\label{abstr}

\end{equation} for the class $\F$ of all integrable, $f\in L_1(V,d,\mu)$,

Lipschitz functions on a probability metric space $(V,d,\mu)$. Consider the

following closely related isoperimetric problem. Given $t>0$,

\begin{equation}

\text{minimize}\ \mu(A^h)\ \text{over all}\ A\subset V\ \text{with}\ \mu(A)\geq

t, \label{absext}

\end{equation} where $A^h=\{u\in V: d(u,A)\leq h\}$ is an $h$-neighborhood of

$A$. We solve the problem $\eqref{abstr}$ for spaces $(V,d,\mu)$ such that for

every $t \geq 0$ there exists a solution of $\eqref{absext}$ which

does not depend

on $h$.

As corollaries we get exact solutions of $\eqref{abstr}$ for Euclidean unit

sphere $S^{n-1}$ with a geodesic distance and a normalized Haar measure, for

$\R^n$ equipped with a Gaussian measure and for $n$-dimensional cube, rectangle,

powers of tori, Diamond graph equipped with uniform measure and Hamming

distance. We also obtain a general result for probability metric spaces. Let

$\F^{\ast}$ be a family of distance functions $f(u)=-d(A,u)$, with $A\subset V$

measurable. For all probability metric spaces we prove that

$$

\sup_{f\in\F}\mu\{f-\E_{\mu} f\geq x\}=\sup_{f\in\F^{\ast}}\mu\{f-\E_{\mu}

f\geq x\},\qquad\ x\in\R.

$$*Multiscale Geometric and Spectral Analysis of Plane Arrangements*

Guangliang Chen (Duke University)

Modeling high dimensional data by multiple low-dimensional planes

is an important problem in many applications such

as computer vision and pattern recognition. In

the most general setting where only coordinates of

the data are given, the problem asks to determine

the optimal model parameters, estimate the model

planes, and cluster the data accordingly. Though

many algorithms have been proposed, most of them

need to assume prior knowledge of the model parameters

and thus address only the last two components of the problem.

In this paper we propose an accurate and efficient algorithm

based on multiscale SVD analysis and spectral methods to tackle

the problem.*Computable performance bounds on sparsity recovery*

Gongguo Tang (University of Wisconsin, Madison)

In this work, we develop verifiable and computable performance bounds on sparsity recovery. We define a family of goodness measures for arbitrary sensing matrices as a set of optimization problems, and design algorithms based on fixed point theory to compute these goodness measures. The proposed algorithms solve a series of second-order cone programs, or linear programs. As a by-product, we implement an efficient algorithm to verify a sufficient condition for exact sparsity recovery in the noise-free case. We derive performance bounds on the recovery errors in terms of these goodness measures. We also analytically demonstrate that the developed goodness measures are non-degenerate for a large class of random sensing matrices, as long as the number of measurements is relatively large. Numerical experiments show that, compared with the restricted isometry based performance bounds, our error bounds apply to a wider range of problems and are tighter, when the sparsity levels of the signals are relatively low.*Alternating Direction Methods for Sparse and Low-Rank*

Optimization

Shiqian Ma (University of Minnesota, Twin Cities)

We show how alternating direction methods are used recently in

sparse and low-rank optimization problems. We also show the iteration

complexity bounds for two specific types of alternating direction

methods, namely alternating linearizaton and multiple splitting methods.

We prove that the basic versions of both methods require $O(1/eps)$

iterations to obtain an eps-optimal solution, and the accelerated

versions of these methods require $O(1/\sqrt{eps})$ iterations, while

the computational effort needed at each iteration is almost unchanged.

Numerical results on problems with millions of variables and constraints

arising from computer vision, machine learning and image processing are

reported to demonstrate the efficiency of the proposed methods.*Multiscale Geometric Dictionaries for Point-cloud Data*

We develop a novel geometric multiresolution analysis for analyzing intrinsically low dimensional point clouds in high-dimensional spaces, modeled as samples from a d-dimensional set M (in particular, a manifold) embedded in D-dimensional Euclidean space, in the regime d<<D. This type of situation has been recognized as important in various applications, such as the analysis of sounds, images, and gene arrays. In this paper we construct data-dependent multiscale dictionaries that aim at efficient encoding and manipulating of the data. Unlike existing constructions, our construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points have a guaranteed sparsity in terms of the dictionary.*Feature Selection in Directed Acyclic Graphs with Structured Sparsity*

Julien Mairal (University of California, Berkeley)

We consider supervised learning problems where the features are embedded in a directed graph, and whose solutions are known to be a priori sparse. We address the task of selecting variables corresponding to connected components of the graph, and leverage to that effect recent work on structured sparsity. We introduce convex and non-convex regularization functions achieving this goal, which involve an exponential number of groups of variables, and show that when the graph is acyclic the combinatorial problems related to these formulations can be solved with network flow optimization techniques. We present experiments on image and genomic data, demonstrating the benefits of these penalties over existing ones as well as the scalability of our approach for prediction tasks.

This is a joint work with Bin Yu (UC Berkeley).*Sparse Measurement Matrices for Sequential Adaptive Compressive Sensing*

Jarvis Haupt (University of Minnesota, Twin Cities)

The theory of Distilled Sensing (DS) establishes that sequential adaptive sensing provides significant quantifiable improvements in effective signal-to-noise ratio (relative to non-adaptive sensing) in noisy high dimensional testing problems. The DS idea was recently extended to highly undersampled settings typical in compressed sensing (CS), and it was shown that a Compressive Distilled Sensing (CDS) method can produce similar performance gains in noisy CS problems. However, the results obtained in the initial work on CDS were restricted to settings where the unknown sparse signals of interest have limited dynamic range (on the order of a constant). In this work we describe a new adaptive compressive sensing approach which utilizes sequentially designed sparse measurement matrices. Our analysis of this approach shows that the quantifiable benefits of adaptivity in noisy CS problems can be achieved without restrictive conditions on the dynamic range of the signal being acquired.

This is joint work with Rich Baraniuk (Rice University), Rui Castro (Eindhoven University of Technology), and Rob Nowak (University of Wisconsin.*Automatic Modulation Recognition for Spectrum Sensing using Nonuniform Compressive Samples*

Chia Wei Lim (Colorado School of Mines)

The theory of Compressive Sensing (CS) has enabled the efficient acquisition of high-bandwidth (but sparse) signals via nonuniform low-rate sampling protocols. While most work in CS

has focused on reconstructing the high-bandwidth signals from nonuniform low-rate samples, in this work, we consider the task of inferring the modulation of a communications signal directly in

the compressed domain, without requiring signal reconstruction. We show that the Nth power nonlinear features used for Automatic Modulation Recognition (AMR) are compressible in

the Fourier domain, and hence, that AMR of M-ary Phase-Shift-Keying (MPSK) modulated signals is possible by applying the same nonlinear transformation on nonuniform compressive

samples. We provide analytical support for the accurate approximation of AMR features from nonuniform samples, present practical rules for classification of modulation type using these

samples, and validate our proposed rules on simulated data.*K-nearest neighbour graph (kNNG) entropy estimation in high dimensions*

Alfred Hero III (University of Michigan)

Entropy estimation is a common approach to empirically determining the spread of non-spherical distributions and has been applied to areas such as anomaly detection, intrinsic dimension estimation, information fusion, and structure estimation in factor graphs. k-NN graph estimators are a popular choice for estimating entropy with desirable properties including consistency and ease of implementation. However, in the high dimensional setting, the bias of k-NN graph estimators dominates the mean square error (MSE) and decays very slowly in the sample size T at a rate of order T^(-1/d) where d is the dimension. We present a weighted k-NN estimator that improves the MSE convergence rate to T^(-1) independent of dimension. We illustrate the weighted estimator for an anomaly detection problem in wireless sensor networks.*Online Subspace Estimation and Tracking from Incomplete and Corrupted Data*

Laura Balzano (University of Wisconsin, Madison)

Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for Matrix Completion and Robust PCA have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.*Combinatorial selection and least absolute shrinkage via the CLASH algorithm*

Anastasios Kyrillidis (École Polytechnique Fédérale de Lausanne (EPFL))

The least absolute shrinkage and selection operator (Lasso) for linear regression exploits the geometric interplay of the l2-data error objective and the l1-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of Lasso in this paper to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) algorithm. A highlight is the introduction of a new algorithmic definition of CSMs, which we dub as the Polynomial time Modular ε-Approximation Property (PMAPε). PMAPε enables us to determine the impact of approximate combinatorial projections within CLASH. We then provide concrete guidelines how to leverage sets with PMAPε within CLASH, and characterize CLASH's estimation guarantees as a function of ε as well as the set restricted isometry constants of the regression matrix. Finally, we present experimental results using both simulated and real world data to demonstrate the effectiveness of CLASH.*On the Bernstein type inequality and approximation properties of splines and wavelets*

Susanna Spektor (University of Alberta)

Almost any kind of experimental sciences requires the analysis of the data. Depending on the specific application, the collection

of data may consist of measurements, signals, or images. In mathematical framework, all of those objects are functions. One

way to analyze them is by representing into a wavelet decomposition. Wavelets provide reconstruction (approximation) of the original

function (the collection of data). In order to characterize the approximation class, one has to establish Bernstein type

inequalitiy which helps to investigate the magnitude of the coefficients in a wavelet decomposition of the

function. The coefficients tell in what way the analyzing function needs to be modified in order to reconstruct the data.

We analyze wavelet coefficients for both the family of Daubechies orthonormal wavelets and the

family of semiorthogonal spline wavelets in L_p space. In particular, Bernstein type inequalities associated with wavelets

are presented. We obtain a sharp estimate of Bernstein type inequality for splines. We also study the asymptotic

behavior of wavelet coefficients for both families mentioned above. Comparison of these two families is done by using a quantity C_{k,p}

related to the wavelet generators. In addition we determine the exact rate of decay of the m-th order B-spline wavelet and thereby describe

the asymptotic behavior of the wavelet. Finally we will use splines in order to get the asymptotically best constant in Ball's integral inequality.*New Wavelet Coefficient Raster Scannings for Deterministic Compressive Imaging*

Marco Duarte (University of Massachusetts)

The Delsarte-Goethals frame has been proposed for deterministic compressive sensing of sparse and compressible signals. Its performance in compressive imaging applications falls short of that obtained for arbitrary sparse vectors. Prior work has proposed specially tailored signal recovery algorithms that partition the recovery of the input vector into clustered and unclustered portions. We present a formal analysis of the Delsarte-Goethals frame that shows that its hampered performance in compressive imaging is due to the presence of clustered sparse vectors in its null space. Such a clustered structure is characteristic in commonly-used raster scanning of 2-D wavelet representations. We also show that a simple randomized raster scanning of the wavelet coefficient vector can remedy these shortcomings, allowing the use of standard recovery algorithms. Additionally, we propose an alternative deterministic raster scanning that achieves similar recovery performance. Experimental results confirm the improvements in recovery performance for both the noiseless and noisy measurement regimes.

This is joint work with Sina Jafarpour and Robert Calderbank.*Matched Filtering from Limited Frequency Samples*

Armin Eftekhari (Colorado School of Mines)

In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this compressive matched filter as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays.*Compressive System Identification: Theory and Applications in the Analysis of High-Dimensional Dynamical Systems*

Borhan Molazem Sanandaji (Colorado School of Mines)

The information content of many phenomena of practical interest is often much less than what is suggested by their actual size. As an inspiring example, one active research area in biology is to understand the relations between the genes. While the number of genes in a so-called gene network can be large, the number of contributing genes to each given gene in the network is usually small compared to the size of the network. In other words, the behavior of each gene can be expressed as a sparse combination of other genes. The purpose of this work is to develop new theory and algorithms for exploiting this type of simplicity in the analysis of high-dimensional dynamical systems with a particular focus on system identification and estimation.

We look at high-dimensional but sparse systems, time varying systems with few piecewise-constant parameter changes, and large-scale interconnected dynamical systems when the graph has a sparse flow. Inspired by the field of Compressive Sensing (CS), we call these problems Compressive System Identification (CSI), Compressive Topology Identification (CTI), etc.*Exact convex relaxation for the planted clique and clustering*

problems

Brendan Ames (University of Minnesota, Twin Cities)

We propose new convex

relaxations for the maximum clique, maximum edge biclique, and clique

partitioning problems. These relaxations are based on the observation

that there is a one-to-one correspondence between cliques or bicliques

in the input graph and rank-one blocks in its adjacency matrix. These

problems may be formulated as rank minimization or rank-constrained

optimization problems, which are subsequently relaxed to convex programs

using the nuclear norm. In the case that the input graph contains an

instance of the desired structure plus diversionary edges and nodes,our

relaxation is {\bf exact}: solving the relaxation recovers the optimal

solution for the original hard combinatorial problem. For each problem,

we provide theoretical bounds on the size of the planted subgraph and

amount of noise that our algorithm can tolerate and still recover the

exact solution. Our heuristic is applicable to any input graph, and we

provide numerical examples that empirically demonstrate the

effectiveness of our algorithm for graphs drawn from clustering and

image segmentation applications.*Joint Denoising of a signal ensemble: the power of veto*

Alejandro Weinstein (Colorado School of Mines)

This work presents a technique to denoise a signal ensemble by

exploiting sparsity both at the inter and intra-signal level. The

problem of signal denoising using thresholding estimators has received a

lot of attention in the literature, starting in the 1990s when Donoho

and Johnstone introduced the concept of wavelet shrinkage. In this

approach, the signal is represented in a basis where it is sparse, and

each noisy coefficient is thresholded by a parameter that depends on the

noise level. We are extending this concept to a set of signals, under

the assumption that the signals have a common support. Our approach is

based on a vetoing mechanism, where in addition to thresholding, the

inter-signal information is used to save a coefficient that otherwise

would be killed. Our method achieve a smaller risk than the

independent denoising, and we quantify the expected value of this

improvement. The results show a consistent improvement over the

independent denoising, achieving results close to the ones produced by

an oracle. We validate the technique using both synthetic and real world

signals*High Dimensional Data Re-parameterization*

Dan Kushnir (Yale University)

We present an efficient method for computing an

extendable re-parameterization of high dimensional stochastic data sets

generated by nonlinear mixing of independent physical parameters. Our method relies on the spectral decomposition of a particular diffusion kernel

constructed from observed data. According to Sturm-Liouville oscillation

theory, a subset of the kernel eigenvectors are monotonic functions of

the original parameters. Thus, a re-parameterization via these

eigenvectors provides an inverse map back to the space of physically

meaningful parameters. We further show how the inverse map can be

efficiently extended to newly observed data, and how the extension

enhances the performance of data classification.

We demonstrate the use of our method for solving empirically inverse problems such as the

reconstruction of geological formations from electro-magnetic measurements, and source localization

in Acoustics.*Finding Non-overlapping Clusters for Generalized Inference Over*

Graphical Models

Divyanshu Vats (University of Minnesota, Twin Cities)

Graphical models compactly capture stochastic dependencies amongst a

collection of random variables using a graph. Inference over graphical

models corresponds to finding marginal probability distributions given

joint probability distributions. Several inference algorithms rely on

iterative message passing between nodes. Although these algorithms can

be generalized so that the message passing occurs between clusters of

nodes, there are limited frameworks for finding such clusters.

Moreover, current frameworks rely on finding clusters that are

overlapping. This increases the computational complexity of finding

clusters since the edges over a graph with overlapping clusters must

be chosen carefully to avoid inconsistencies in the marginal

distribution computations. In this paper, we propose a framework for

finding clusters in a graph for generalized inference so that the

clusters are \emph{non-overlapping}. Given an undirected graph, we

first derive a linear time algorithm for constructing a block-tree, a

tree-structured graph over a set of non-overlapping clusters. We show

how the belief propagation (BP) algorithm can be applied to

block-trees to get exact inference algorithms. We then show how larger

clusters in a block-tree can be efficiently split into smaller

clusters so that the resulting graph over the smaller clusters, which

we call a block-graph, has lower number of cycles than the original

graph. We show how loopy BP (LBP) can be applied to block-graphs for

approximate inference algorithms. Numerical simulations show the

benefits of running LBP on block-graphs as opposed to running LBP on

the original graph. Our proposed framework for generalizing BP and LBP

can be applied to other inference algorithms.