Reception and Poster Session

Wednesday, September 28, 2011 - 4:15pm - 5:30pm
Lind 400
  • Entropy and Convex Geometry in High Dimensions
    Sergey Bobkov (University of Minnesota, Twin Cities)Mokshay Madiman (Yale University)
    We develop an information-theoretic perspective on some questions in high-dimensional convex geometry. First, we describe a new equipartition property for log-concave probability measures, which is a precise formulation of the intuition that log-concave distributions are effectively uniformly distributed on compact sets. Second, we develop some Gaussian comparison results for log-concave measures, including some entropic formulations of the hyperplane conjecture. Third, we establish a new reverse entropy power inequality for convex measures analogous to V. Milman's reverse Brunn-Minkowski inequality.
  • Noise thresholds for spectral clustering and biclustering in high dimensions
    Aarti Singh (Carnegie-Mellon University)
    Despite the empirical success of spectral clustering, its performance under noise and in high dimensions is not well understood. We provide a precise characterization of the misclustering rate of spectral clustering under noisy similarities. Using minimax theory, we establish information theoretic lower bounds on the amount of noise any clustering algorithm can tolerate and demonstrate that spectral clustering is near-optimal. The gap relative to the optimal performance is the statistical price for computational efficiency gained by solving a relaxation of the minimum ratio-cut problem. Additionally, in high dimensional settings, the clusters can be very small and are often characterized by only a few relevant features. To enable simultaneous extraction of such a cluster and the relevant features, we consider a sparsity penalized spectral biclustering method and compare its performance to minimax limits.
  • Extremal Lipschitz functions in the deviation from the mean inequalities
    Dainius Dzindzalieta (Vilnius State University)
    We consider a deviation from the mean in the problem
    D(x)\=\sup_{f\in \F}\mu\{f-\E_{\mu} f\geq x\},\qquad\ \text{for}\
    \end{equation} for the class $\F$ of all integrable, $f\in L_1(V,d,\mu)$,
    Lipschitz functions on a probability metric space $(V,d,\mu)$. Consider the
    following closely related isoperimetric problem. Given $t>0$,
    \text{minimize}\ \mu(A^h)\ \text{over all}\ A\subset V\ \text{with}\ \mu(A)\geq
    t, \label{absext}
    \end{equation} where $A^h=\{u\in V: d(u,A)\leq h\}$ is an $h$-neighborhood of
    $A$. We solve the problem $\eqref{abstr}$ for spaces $(V,d,\mu)$ such that for
    every $t \geq 0$ there exists a solution of $\eqref{absext}$ which
    does not depend
    on $h$.
    As corollaries we get exact solutions of $\eqref{abstr}$ for Euclidean unit
    sphere $S^{n-1}$ with a geodesic distance and a normalized Haar measure, for
    $\R^n$ equipped with a Gaussian measure and for $n$-dimensional cube, rectangle,
    powers of tori, Diamond graph equipped with uniform measure and Hamming
    distance. We also obtain a general result for probability metric spaces. Let
    $\F^{\ast}$ be a family of distance functions $f(u)=-d(A,u)$, with $A\subset V$
    measurable. For all probability metric spaces we prove that
    \sup_{f\in\F}\mu\{f-\E_{\mu} f\geq x\}=\sup_{f\in\F^{\ast}}\mu\{f-\E_{\mu}
    f\geq x\},\qquad\ x\in\R.
  • Multiscale Geometric and Spectral Analysis of Plane Arrangements
    Guangliang Chen (Duke University)
    Modeling high dimensional data by multiple low-dimensional planes
    is an important problem in many applications such
    as computer vision and pattern recognition. In
    the most general setting where only coordinates of
    the data are given, the problem asks to determine
    the optimal model parameters, estimate the model
    planes, and cluster the data accordingly. Though
    many algorithms have been proposed, most of them
    need to assume prior knowledge of the model parameters
    and thus address only the last two components of the problem.
    In this paper we propose an accurate and efficient algorithm
    based on multiscale SVD analysis and spectral methods to tackle
    the problem.
  • Computable performance bounds on sparsity recovery
    Gongguo Tang (University of Wisconsin, Madison)
    In this work, we develop verifiable and computable performance bounds on sparsity recovery. We define a family of goodness measures for arbitrary sensing matrices as a set of optimization problems, and design algorithms based on fixed point theory to compute these goodness measures. The proposed algorithms solve a series of second-order cone programs, or linear programs. As a by-product, we implement an efficient algorithm to verify a sufficient condition for exact sparsity recovery in the noise-free case. We derive performance bounds on the recovery errors in terms of these goodness measures. We also analytically demonstrate that the developed goodness measures are non-degenerate for a large class of random sensing matrices, as long as the number of measurements is relatively large. Numerical experiments show that, compared with the restricted isometry based performance bounds, our error bounds apply to a wider range of problems and are tighter, when the sparsity levels of the signals are relatively low.
  • Alternating Direction Methods for Sparse and Low-Rank


    Shiqian Ma (University of Minnesota, Twin Cities)
    We show how alternating direction methods are used recently in
    sparse and low-rank optimization problems. We also show the iteration
    complexity bounds for two specific types of alternating direction
    methods, namely alternating linearizaton and multiple splitting methods.
    We prove that the basic versions of both methods require $O(1/eps)$
    iterations to obtain an eps-optimal solution, and the accelerated
    versions of these methods require $O(1/\sqrt{eps})$ iterations, while
    the computational effort needed at each iteration is almost unchanged.
    Numerical results on problems with millions of variables and constraints
    arising from computer vision, machine learning and image processing are
    reported to demonstrate the efficiency of the proposed methods.
  • Multiscale Geometric Dictionaries for Point-cloud Data

    We develop a novel geometric multiresolution analysis for analyzing intrinsically low dimensional point clouds in high-dimensional spaces, modeled as samples from a d-dimensional set M (in particular, a manifold) embedded in D-dimensional Euclidean space, in the regime d<<D. This type of situation has been recognized as important in various applications, such as the analysis of sounds, images, and gene arrays. In this paper we construct data-dependent multiscale dictionaries that aim at efficient encoding and manipulating of the data. Unlike existing constructions, our construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points have a guaranteed sparsity in terms of the dictionary.
  • Feature Selection in Directed Acyclic Graphs with Structured Sparsity
    Julien Mairal (University of California, Berkeley)
    We consider supervised learning problems where the features are embedded in a directed graph, and whose solutions are known to be a priori sparse. We address the task of selecting variables corresponding to connected components of the graph, and leverage to that effect recent work on structured sparsity. We introduce convex and non-convex regularization functions achieving this goal, which involve an exponential number of groups of variables, and show that when the graph is acyclic the combinatorial problems related to these formulations can be solved with network flow optimization techniques. We present experiments on image and genomic data, demonstrating the benefits of these penalties over existing ones as well as the scalability of our approach for prediction tasks.

    This is a joint work with Bin Yu (UC Berkeley).
  • Sparse Measurement Matrices for Sequential Adaptive Compressive Sensing
    Jarvis Haupt (University of Minnesota, Twin Cities)
    The theory of Distilled Sensing (DS) establishes that sequential adaptive sensing provides significant quantifiable improvements in effective signal-to-noise ratio (relative to non-adaptive sensing) in noisy high dimensional testing problems. The DS idea was recently extended to highly undersampled settings typical in compressed sensing (CS), and it was shown that a Compressive Distilled Sensing (CDS) method can produce similar performance gains in noisy CS problems. However, the results obtained in the initial work on CDS were restricted to settings where the unknown sparse signals of interest have limited dynamic range (on the order of a constant). In this work we describe a new adaptive compressive sensing approach which utilizes sequentially designed sparse measurement matrices. Our analysis of this approach shows that the quantifiable benefits of adaptivity in noisy CS problems can be achieved without restrictive conditions on the dynamic range of the signal being acquired.

    This is joint work with Rich Baraniuk (Rice University), Rui Castro (Eindhoven University of Technology), and Rob Nowak (University of Wisconsin.
  • Automatic Modulation Recognition for Spectrum Sensing using Nonuniform Compressive Samples
    Chia Wei Lim (Colorado School of Mines)
    The theory of Compressive Sensing (CS) has enabled the efficient acquisition of high-bandwidth (but sparse) signals via nonuniform low-rate sampling protocols. While most work in CS
    has focused on reconstructing the high-bandwidth signals from nonuniform low-rate samples, in this work, we consider the task of inferring the modulation of a communications signal directly in
    the compressed domain, without requiring signal reconstruction. We show that the Nth power nonlinear features used for Automatic Modulation Recognition (AMR) are compressible in
    the Fourier domain, and hence, that AMR of M-ary Phase-Shift-Keying (MPSK) modulated signals is possible by applying the same nonlinear transformation on nonuniform compressive
    samples. We provide analytical support for the accurate approximation of AMR features from nonuniform samples, present practical rules for classification of modulation type using these
    samples, and validate our proposed rules on simulated data.
  • K-nearest neighbour graph (kNNG) entropy estimation in high dimensions
    Alfred Hero III (University of Michigan)
    Entropy estimation is a common approach to empirically determining the spread of non-spherical distributions and has been applied to areas such as anomaly detection, intrinsic dimension estimation, information fusion, and structure estimation in factor graphs. k-NN graph estimators are a popular choice for estimating entropy with desirable properties including consistency and ease of implementation. However, in the high dimensional setting, the bias of k-NN graph estimators dominates the mean square error (MSE) and decays very slowly in the sample size T at a rate of order T^(-1/d) where d is the dimension. We present a weighted k-NN estimator that improves the MSE convergence rate to T^(-1) independent of dimension. We illustrate the weighted estimator for an anomaly detection problem in wireless sensor networks.
  • Online Subspace Estimation and Tracking from Incomplete and Corrupted Data
    Laura Balzano (University of Wisconsin, Madison)
    Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for Matrix Completion and Robust PCA have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.
  • Combinatorial selection and least absolute shrinkage via the CLASH algorithm
    Anastasios Kyrillidis (École Polytechnique Fédérale de Lausanne (EPFL))
    The least absolute shrinkage and selection operator (Lasso) for linear regression exploits the geometric interplay of the l2-data error objective and the l1-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of Lasso in this paper to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) algorithm. A highlight is the introduction of a new algorithmic definition of CSMs, which we dub as the Polynomial time Modular ε-Approximation Property (PMAPε). PMAPε enables us to determine the impact of approximate combinatorial projections within CLASH. We then provide concrete guidelines how to leverage sets with PMAPε within CLASH, and characterize CLASH's estimation guarantees as a function of ε as well as the set restricted isometry constants of the regression matrix. Finally, we present experimental results using both simulated and real world data to demonstrate the effectiveness of CLASH.
  • On the Bernstein type inequality and approximation properties of splines and wavelets
    Susanna Spektor (University of Alberta)
    Almost any kind of experimental sciences requires the analysis of the data. Depending on the specific application, the collection
    of data may consist of measurements, signals, or images. In mathematical framework, all of those objects are functions. One
    way to analyze them is by representing into a wavelet decomposition. Wavelets provide reconstruction (approximation) of the original
    function (the collection of data). In order to characterize the approximation class, one has to establish Bernstein type
    inequalitiy which helps to investigate the magnitude of the coefficients in a wavelet decomposition of the
    function. The coefficients tell in what way the analyzing function needs to be modified in order to reconstruct the data.
    We analyze wavelet coefficients for both the family of Daubechies orthonormal wavelets and the
    family of semiorthogonal spline wavelets in L_p space. In particular, Bernstein type inequalities associated with wavelets
    are presented. We obtain a sharp estimate of Bernstein type inequality for splines. We also study the asymptotic
    behavior of wavelet coefficients for both families mentioned above. Comparison of these two families is done by using a quantity C_{k,p}
    related to the wavelet generators. In addition we determine the exact rate of decay of the m-th order B-spline wavelet and thereby describe
    the asymptotic behavior of the wavelet. Finally we will use splines in order to get the asymptotically best constant in Ball's integral inequality.
  • New Wavelet Coefficient Raster Scannings for Deterministic Compressive Imaging
    Marco Duarte (University of Massachusetts)
    The Delsarte-Goethals frame has been proposed for deterministic compressive sensing of sparse and compressible signals. Its performance in compressive imaging applications falls short of that obtained for arbitrary sparse vectors. Prior work has proposed specially tailored signal recovery algorithms that partition the recovery of the input vector into clustered and unclustered portions. We present a formal analysis of the Delsarte-Goethals frame that shows that its hampered performance in compressive imaging is due to the presence of clustered sparse vectors in its null space. Such a clustered structure is characteristic in commonly-used raster scanning of 2-D wavelet representations. We also show that a simple randomized raster scanning of the wavelet coefficient vector can remedy these shortcomings, allowing the use of standard recovery algorithms. Additionally, we propose an alternative deterministic raster scanning that achieves similar recovery performance. Experimental results confirm the improvements in recovery performance for both the noiseless and noisy measurement regimes.

    This is joint work with Sina Jafarpour and Robert Calderbank.
  • Matched Filtering from Limited Frequency Samples
    Armin Eftekhari (Colorado School of Mines)
    In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this compressive matched filter as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays.
  • Compressive System Identification: Theory and Applications in the Analysis of High-Dimensional Dynamical Systems
    Borhan Molazem Sanandaji (Colorado School of Mines)
    The information content of many phenomena of practical interest is often much less than what is suggested by their actual size. As an inspiring example, one active research area in biology is to understand the relations between the genes. While the number of genes in a so-called gene network can be large, the number of contributing genes to each given gene in the network is usually small compared to the size of the network. In other words, the behavior of each gene can be expressed as a sparse combination of other genes. The purpose of this work is to develop new theory and algorithms for exploiting this type of simplicity in the analysis of high-dimensional dynamical systems with a particular focus on system identification and estimation.

    We look at high-dimensional but sparse systems, time varying systems with few piecewise-constant parameter changes, and large-scale interconnected dynamical systems when the graph has a sparse flow. Inspired by the field of Compressive Sensing (CS), we call these problems Compressive System Identification (CSI), Compressive Topology Identification (CTI), etc.
  • Exact convex relaxation for the planted clique and clustering


    Brendan Ames (University of Minnesota, Twin Cities)
    We propose new convex
    relaxations for the maximum clique, maximum edge biclique, and clique
    partitioning problems. These relaxations are based on the observation
    that there is a one-to-one correspondence between cliques or bicliques
    in the input graph and rank-one blocks in its adjacency matrix. These
    problems may be formulated as rank minimization or rank-constrained
    optimization problems, which are subsequently relaxed to convex programs
    using the nuclear norm. In the case that the input graph contains an
    instance of the desired structure plus diversionary edges and nodes,our
    relaxation is {\bf exact}: solving the relaxation recovers the optimal
    solution for the original hard combinatorial problem. For each problem,
    we provide theoretical bounds on the size of the planted subgraph and
    amount of noise that our algorithm can tolerate and still recover the
    exact solution. Our heuristic is applicable to any input graph, and we
    provide numerical examples that empirically demonstrate the
    effectiveness of our algorithm for graphs drawn from clustering and
    image segmentation applications.
  • Joint Denoising of a signal ensemble: the power of veto
    Alejandro Weinstein (Colorado School of Mines)
    This work presents a technique to denoise a signal ensemble by
    exploiting sparsity both at the inter and intra-signal level. The
    problem of signal denoising using thresholding estimators has received a
    lot of attention in the literature, starting in the 1990s when Donoho
    and Johnstone introduced the concept of wavelet shrinkage. In this
    approach, the signal is represented in a basis where it is sparse, and
    each noisy coefficient is thresholded by a parameter that depends on the
    noise level. We are extending this concept to a set of signals, under
    the assumption that the signals have a common support. Our approach is
    based on a vetoing mechanism, where in addition to thresholding, the
    inter-signal information is used to save a coefficient that otherwise
    would be killed. Our method achieve a smaller risk than the
    independent denoising, and we quantify the expected value of this
    improvement. The results show a consistent improvement over the
    independent denoising, achieving results close to the ones produced by
    an oracle. We validate the technique using both synthetic and real world
  • High Dimensional Data Re-parameterization
    Dan Kushnir (Yale University)
    We present an efficient method for computing an
    extendable re-parameterization of high dimensional stochastic data sets
    generated by nonlinear mixing of independent physical parameters. Our method relies on the spectral decomposition of a particular diffusion kernel
    constructed from observed data. According to Sturm-Liouville oscillation
    theory, a subset of the kernel eigenvectors are monotonic functions of
    the original parameters. Thus, a re-parameterization via these
    eigenvectors provides an inverse map back to the space of physically
    meaningful parameters. We further show how the inverse map can be
    efficiently extended to newly observed data, and how the extension
    enhances the performance of data classification.

    We demonstrate the use of our method for solving empirically inverse problems such as the
    reconstruction of geological formations from electro-magnetic measurements, and source localization
    in Acoustics.
  • Finding Non-overlapping Clusters for Generalized Inference Over

    Graphical Models

    Divyanshu Vats (University of Minnesota, Twin Cities)
    Graphical models compactly capture stochastic dependencies amongst a
    collection of random variables using a graph. Inference over graphical
    models corresponds to finding marginal probability distributions given
    joint probability distributions. Several inference algorithms rely on
    iterative message passing between nodes. Although these algorithms can
    be generalized so that the message passing occurs between clusters of
    nodes, there are limited frameworks for finding such clusters.
    Moreover, current frameworks rely on finding clusters that are
    overlapping. This increases the computational complexity of finding
    clusters since the edges over a graph with overlapping clusters must
    be chosen carefully to avoid inconsistencies in the marginal
    distribution computations. In this paper, we propose a framework for
    finding clusters in a graph for generalized inference so that the
    clusters are \emph{non-overlapping}. Given an undirected graph, we
    first derive a linear time algorithm for constructing a block-tree, a
    tree-structured graph over a set of non-overlapping clusters. We show
    how the belief propagation (BP) algorithm can be applied to
    block-trees to get exact inference algorithms. We then show how larger
    clusters in a block-tree can be efficiently split into smaller
    clusters so that the resulting graph over the smaller clusters, which
    we call a block-graph, has lower number of cycles than the original
    graph. We show how loopy BP (LBP) can be applied to block-graphs for
    approximate inference algorithms. Numerical simulations show the
    benefits of running LBP on block-graphs as opposed to running LBP on
    the original graph. Our proposed framework for generalizing BP and LBP
    can be applied to other inference algorithms.