HOME    »    SCIENTIFIC RESOURCES    »    Volumes
Abstracts and Talk Materials
Multi-Manifold Data Modeling and Applications
October 27-30, 2008


Approximate nearest subspace search with applications to pattern recognition (poster)
October 27, 2008 6:30 pm - 8:00 pm

Linear and affine subspaces are commonly used to describe the appearance of objects under different lighting, viewpoint, articulation, and even identity. A natural problem arising from their use is -- given a query image portion represented as a point in some high dimensional space -- find a subspace near to the query. This talk presents an efficient solution to the approximate nearest subspace problem for both linear and affine subspaces. Our method is based on a simple reduction to the problem of nearest point search, and can thus employ tree-based search or locality-sensitive hashing to find a near subspace. Further performance improvement is achieved by using random projections to lower the dimensionality of the problem. We provide theoretical proofs of correctness and error bounds of our construction, and demonstrate its capabilities on synthetic and real data. Our experiments demonstrate that an approximate nearest subspace can be located significantly faster than the exact nearest subspace, while at the same time it can find better matches compared to a similar search on points, in the presence of variations due to viewpoint, lighting, and so forth.


Supervised dictionary learning (poster)
October 27, 2008 6:30 pm - 8:00 pm

It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This work proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple class-decision functions. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. This is a joint work with F. Bach (INRIA), J. Ponce (Ecole Normale Supérieure), G. Sapiro (University of Minnesota) and A. Zisserman (Oxford University).


Joint manifold models for collaborative inference (poster)
October 27, 2008 6:30 pm - 8:00 pm

We introduce a new framework for collaborative inference and efficient fusion of manifold-modeled data. We formulate the notion of a joint manifold model for signal ensembles, and using this framework we demonstrate the superior performance of joint processing techniques for a range of tasks including detection, classification, parameter estimation, and manifold learning. We then exploit recent results concerning random projections of low-dimensional manifolds to develop an efficient framework for distributed data fusion. As an example, we extend the smashed filter – a maximum likelihood, model-based estimation and classification algorithm that exploits random measurements – to a distributed setting. Bounds for the robustness of this scheme against measurement noise are derived. We demonstrate the utility of our framework in a variety of settings, including large scale camera networks, networks of acoustic sensors, and multi-modal sensors.

This is joint work with Richard Baraniuk, Marco Duarte, and Chinmay Hegde.

Bradley Alpert - National Institute of Standards and Technology

Compressive sampling reconstruction by subspace refinement (poster)
October 27, 2008 6:30 pm - 8:00 pm

Spurred by recent progress in compressive sampling methods, we develop a new reconstruction algorithm for the Fourier problem of recovering from noisy samples a linear combination of unknown frequencies embedded in a very large dimensional ambient space. The approach differs from both L1-norm minimization and orthogonal matching pursuit (and its variants) in that no basis for the ambient space is chosen a priori. The result is improved computational complexity. We provide numerical examples that demonstrate the method's robustness and efficiency.

Joint work with Yu Chen.

Richard Baraniuk - Rice University

Manifold models for signal acquisition, compression, and processing
October 27, 2008 3:00 pm - 3:50 pm

Joint work with Mark Davenport, Marco Duarte, Chinmay Hegde, and Michael Wakin.

Compressive sensing is a new approach to data acquisition in which sparse or compressible signals are digitized for processing not via uniform sampling but via measurements using more general, even random, test functions. In contrast with conventional wisdom, the new theory asserts that one can combine "low-rate sampling" (dimensionality reduction) with an optimization-based reconstruction process for efficient and stable signal acquisition. While the growing compressive sensing literature has focused on sparse or compressible signal models, in this talk, we will explore the use of manifold signal models. We will show that for signals that lie on a smooth manifold, the number of measurements required for a stable representation is proportional to the dimensionality of the manifold, and only logarithmic in the ambient dimension — just as for sparse signals. As an application, we consider learning and inference from manifold-modeled data, such as detecting tumors in medical images, classifying the type of vehicle in airborne surveillance, or estimating the trajectory of an object in a video sequence. Specific techniques we will overview include compressive approaches to the matched filter (dubbed the "smashed filter"), intrinsic dimension estimation for point clouds, and manifold learning algorithms. We will also present a new approach based on the joint articulation manifold (JAM) for compressive distributed learning, estimation, and classification.

Misha Belkin - The Ohio State University

Spectral and geometric methods in learning
October 28, 2008 3:55 pm - 4:45 pm

In recent years a variety of spectral and geometry-based methods have become popular for various tasks of machine learning, such as dimensionality reduction, clustering and semi-supervised learning. These methods use a model of data as a probability distribution on a manifold, or, more generally a mixture of manifolds. In the talk I will discuss some of these methods and recent theoretical results on their convergence. I will also talk about how spectral methods can be used to learn mixtures of Gaussian distributions, which may be considered the simplest case of multi-manifold data modeling.

Gunnar Carlsson - Stanford University

Topology and data
October 29, 2008 3:00 pm - 3:50 pm

The nature and quantity of data arising out of scientific applications requires novel methods, both for exploratory analysis as well as analysis of significance and validation. One set of new methods relies on ideas and methods from topology. The study of data sets requires an extension of standard methods, which we refer to as persistent topology. We will survey this area, and show that algebraic methods can be applied both to the exploratory and the validation side of investigations, and show some examples.

Frédéric Chazal - INRIA Saclay - Île-de-France

Analysis of scalar fields over point cloud data (poster)
October 27, 2008 6:30 pm - 8:00 pm

(Joint work with L. Guibas, S. Oudot and P. Skraba - to appear in proc. SODA'09).

Given a real-valued function f defined over some metric space X, is it possible to recover some structural information about f from the sole information of its values at a finite subset L of sample points, whose pairwise distances in X are given? We provide a positive answer to this question. More precisely, taking advantage of recent advances on the front of stability for persistence diagrams, we introduce a novel algebraic construction, based on a pair of nested families of simplicial complexes built on top of the point cloud L, from which the persistence diagram of f can be faithfully approximated. We derive from this construction a series of algorithms for the analysis of scalar fields from point cloud data. These algorithms are simple and easy to implement, have reasonable complexities, and come with theoretical guarantees. To illustrate the generality and practicality of the approach, we also present experimental results obtained in various applications, ranging from clustering to sensor networks.

Wolfgang Dahmen - RWTH Aachen

Instance optimal adaptive regression in high dimensions
October 28, 2008 3:00 pm - 3:50 pm

Joint work with Peter Binev, Ron DeVore, and Philipp Lamby. This talk addresses the recovery of functions of a large number of variables from point clouds in the context of supervised learning. Our estimator is based on two conceptional pillars. First, the notion of sparse occupancy trees is shown to warrant efficient computations even for a very large number of variables. Second, a properly adjusted adaptive tree-approximation scheme is shown to ensure instance optimal performance. By this we mean the rapid decay (with increasing sample size) of the probability that the estimator deviates from the regression function (in a certain natural norm) by more than the error of best n-term approximation in the sparse tree setting.

Ingrid Daubechies - Princeton University

Math matters public lecture: Surfing with wavelets
October 29, 2008 8:00 pm - 9:15 pm

Wavelets are used in the analysis of sounds and images, as well as in many other applications. The wavelet transform provides a mathematical analog to a music score: just as the score tells a musician which notes to play when, the wavelet analysis of a sound takes things apart into elementary units with a well defined frequency (which note?) and at a well defined time (when?). For images wavelets allow you to first describe the coarse features with a broad brush, and then later to fill in details. This is similar to zooming in with a camera: first you can see that the scene is one of shrubs in a garden, then you concentrate on one shrub and see that it bears berries, then, by zooming in on one branch, you find that this is a raspberry bush. Because wavelets allow you to do a similar thing in more mathematical terms, the wavelet transform is sometimes called a "mathematical microscope."

Wavelets are used by many scientists for many different applications. Outside science as well, wavelets are finding their uses: wavelet transforms are an intergral part of the image compression standard JPEG2000.

The talk will start by explaining the basic principles of wavelets, which are very simple. Then they will be illustrated with some examples, including an explanation of image compression.


Mathematical problems suggested by Analog-to-Digital conversion
October 30, 2008 12:15 pm - 1:05 pm

In Analog-to-Digital conversion, continuously varying functions (e.g. the output of a microphone) are transformed into digital sequences from which one then hopes to be able to reconstruct a close approximation to the original function. The functions under consideration are typically band-limited (i.e. their Fourier transform is zero for frequencies higher than some given value, called the bandwidth); such functions are completely determined by samples taken at a rate determined by their bandwidth. These samples then have to be approximated by a finite binary representation. Surprisingly, in many practical applications one does not just replace each sample by a truncated binary expansion; for various technical reasons, it is more attractive to sample much more often and to replace each sample by just 1 or -1, chosen judicously.

In this talk, we shall see what the attractions are of this quantization scheme, and discuss several interesting mathematical questions suggested by this approach. This will be a review of work by many others as well as myself. It is also a case study of how continuous interaction with engineers helped to shape and change the problems as we tried to make them more precise.

Ronald DeVore - Texas A & M University

Recovering sparsity in high dimensions
October 28, 2008 10:55 am - 11:45 am

We assume that we are in $R^N$ with $N$ large. The first problem we consider is that there is a function $f$ defined on $Omega:=[0,1]^N$ which is a function of just $k$ of the coordinate variables: $f(x_1,dots,x_N)=g(x_{j_1},dots,x_{j_k})$ where $j_1,dots,j_k$ are not known to us. We want to approximate $f$ from some of its point values. We first assume that we are allowed to choose a collection of points in $Omega$ and ask for the values of $f$ at these points. We are interested in what error we can achieve in terms of the number of points when we assume some smoothness of $g$ in the form of Lipschitz or higher smoothness conditions.

We shall consider two settings: adaptive and non-adaptive. In the adaptive setting, we are allowed to ask for a value of $f$ and then on the basis of the answer we get we can ask for another value. In the non-adaptive setting, we must prescribe the $m$ points in advance.

A second problem we shall consider is when $f$ is not necessarily only a function of $k$ variables but it can be approximated to some tolerance $epsilon$ by such a function. We seek again sets of points where the knowledge of the values of $f$ at such points will allow us to approximate $f$ well.

Our main consideration is to derive results which are not severely effected by the size of $N$, i.e. are not victim of the curse of dimensionality. We shall see that this is possible.

Marco Duarte - Rice University

The smashed filter for compressive classification(poster)
October 27, 2008 6:30 pm - 8:00 pm

We propose a framework for exploiting the same measurement techniques used in emerging compressive sensing systems in the new setting of compressive classification. The first part of the framework maps traditional maximum likelihood hypothesis testing into the compressive domain; we find that the number of measurements required for a given classification performance level does not depend on the sparsity or compressibility of the signal but only on the noise level. The second part of the framework applies the generalized maximum likelihood method to deal with unknown transformations such as the translation, scale, or viewing angle of a target object. Such a set of transformed signals forms a low-dimensional, nonlinear manifold in the high-dimensional image space. We exploit recent results that show that random projections of a smooth manifold result in a stable embedding of the manifold in the lower-dimensional space. Non-differential manifolds, prevalent in imaging applications, are handled through the use of multiscale random projections that perform implicit regularization. We find that the number of measurements required for a given classification performance level grows linearly in the dimensionality of the manifold but only logarithmically in the number of pixels/samples and image classes.

This is joint work with Mark Davenport, Michael Wakin and Richard Baraniuk.

Lars Eldén - Linköping University

The best low-rank Tucker approximation of a tensor
October 27, 2008 10:00 am - 10:50 am

The problem of computing the best multilinear low-rank approximation of a tensor can be formulated as an opimization problem on a product of Grassmann manifolds (by multilinear low-rank approximation we understand an approximation in the sense of the Tucker model). In the Grassmann approach we want to find (bases of) subspaces that represent the low-rank approximation. We have recently derived a Newton algorithm for this problem, where a quadratic model on the tangent space of the manifold is used. From the Grassmann Hessian we derive conditions for a local optimum. We also discuss the sensitivity of the subspaces to perturbations of the tensor elements.

Ehsan Elhamifar - Johns Hopkins University

3-D motion segmentation via robust subspace separation (poster)
October 27, 2008 6:30 pm - 8:00 pm

We consider the problem of segmenting multiple rigid-body motions in a video sequence from tracked feature point trajectories. Using the affine camera model, this motion segmentation problem can be cast as the problem of segmenting samples drawn from a union of linear subspaces of dimension two, three or four. However, due to limitations of the tracker, occlusions and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or not correspond to any valid motion model.

In this poster, we present a combination of robust subspace separation schemes that can deal with all of these practical issues in a unified framework. For complete uncorrupted trajectories, we examine approaches that try to harness the subspace structure of the data either globally (Generalized PCA) or by minimizing a lossy coding length (Agglomerative Lossy Coding). For incomplete or corrupted trajectories, we develop methods based on PowerFactorization or L1-minimization. The former method fills in missing entries using a linear projection onto a low-dimensional space. The latter method draws strong connections between lossy compression, rank minimization, and sparse representation. We compare our approach to other methods on a database of 167 motion sequences with full motions, independent motions, degenerate motions, partially dependent motions, missing data, outliers, etc. Our results are on par with state-of-the-art results, and in many cases exceed them.

Charles Fefferman - Princeton University

Interpolation of functions on Rn
October 29, 2008 10:00 am - 10:50 am

The talk explains joint work with Bo'az Klartag, solving the following problem and several variants. Let f be a real-valued function on an N-point set E in Rn. Compute efficiently a function F on the whole Rn that agrees with f on E and has Cm norm close to least possible.

Massimo Fornasier - Johann Radon Institute for Computational and Applied Mathematics

teratively re-weighted least squares and vector valued data restoration from lower dimensional samples (poster)
October 27, 2008 6:30 pm - 8:00 pm

We present the analysis of a superlinear convergent algorithm for L1-minimization based on an iterative reweighted least squares. We show improved performances in compressed sensing. A similar algorithm is then applied for the efficient solution of a system of singular PDEs for image recolorization in a relevant real-life problem of art restoration.

Alvina Goh - Johns Hopkins University
René Vidal - Johns Hopkins University

Clustering on Riemannian manifolds (poster)
October 27, 2008 6:30 pm - 8:00 pm

Over the past few years, various techniques have been developed for learning a low-dimensional representation of a nonlinear manifold embedded in a high-dimensional space. Unfortunately, most of these techniques are limited to the analysis of a single connected nonlinear submanifold of a Euclidean space and suffer from degeneracies when applied to linear manifolds (subspaces).

This work proposes a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold. First, we learn a representation of the data using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces. Such generalizations exploit geometric properties of the Riemannian space, particularly its Riemannian metric. Then, assuming that the data points from different groups are separated, we show that the null space of a matrix built from the local representation gives the segmentation of the data. However, this method can fail when the data points are drawn from a union of linear manifolds, because M contains additional vectors in its null space. In this case, we propose an alternative method for computing M, which avoids the aforementioned degeneracies, thereby resulting in the correct segmentation. The final result is a simple linear algebraic algorithm for simultaneous nonlinear dimensionality reduction and clustering of data lying in multiple linear and nonlinear manifolds.

We present several applications of our algorithm to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI. Our experiments show that our algorithm performs on par with state-of-the-art techniques that are specifically designed for such segmentation problems.

Gloria Haro Ortega - Universitat Politecnica de Catalunya

Detecting mixed dimensionality and density in noisy point clouds
October 27, 2008 10:55 am - 11:45 am

We present a statistical model to learn mixed dimensionalities and densities present in stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The translated Poisson distribution is useful to model a noisy counting process, and it s derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that the sequence of all possible local countings in a point cloud formed by samples of a stratification can be modeled by a mixture of different Translated Poisson distributions, thus allowing the presence of mixed dimensionality and densities in the same data set. A partition of the points in different classes according to both dimensionality and density is obtained, together with an estimation of these quantities for each class.

Chinmay Hegde - Rice University

Random projections for manifold learning (poster)
October 27, 2008 6:30 pm - 8:00 pm

We propose a novel method for linear dimensionality reduction of manifold-modeled data. First, we show that given only a small number random projections of sample points in R^N belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections (M) required is linear in K and logarithmic in N, meaning that K < M

Michael Henderson - IBM

Representing and manipulating implicitly defined manifolds (poster)
October 27, 2008 6:30 pm - 8:00 pm

This poster illustrates an algorithm which was developed to compute implicitly defined manifolds in engineering applications, where the manifold is of low (1-4) dimension, but is embedded in a very high dimensional space (100 and up).

Though the computational details (finding a point and the tangent space of the manifold) are different than in manifold learning, and the embedding is explicit instead of one of the unknowns, there are significant issues in common when computing any manifold.

The representation used closely follows the definition of a manifold. A set of spherical balls (with differing radii)serve as the chart domains, with the embedding mapping limited to the embedded center and the tangent space. In addition, polyhedra are associated with each chart so that overlapping charts correspond to faces of the polyhedra (common to the polyhedra of the overlapping charts). Boundary charts are represented in the same manner, beginning with a polyhedral cone, and adding faces for each overlapping chart.

The polyhedra of interior charts are Laguerre-Voronoi cells, and so it is very easy to locate points on the boundary of a partially represented manifold (if all vertices are closer to the origin than the radius of the ball the chart is completely surrounded by other charts). This provides a basic "continuation" or "extension" algorithm for creating a set of covering charts on a manifold.

In terms of data structures, the atlas of charts is a simple list. The polyhedra are in the same simple list, but also form cell complex whose dual is a Laguerre-Delaunay "triangulation" of the manifold. Interestingly, the fine structure is a cell complex, but so is the gross structure of the manifold. Manifolds with boundary are represented by another cell complex. In this one the faces are manifolds which share boundaries which are the boundary cells of the face.

So far this approach has been applied to three different kinds of manifolds which are common in dynamical systems. I hope to find similar applications in manifold learning.

Tony Jebara - Columbia University

Visualization & matching for graphs and data
October 29, 2008 12:15 pm - 1:05 pm

Given a graph between N high-dimensional nodes, can we faithfully visualize it in just a few dimensions? We present an algorithm that improves the state-of-the art in dimensionality reduction by extending the Maximum Variance Unfolding method. Visualizations are shown for social networks, species trees, image datasets and human activity.

If we are only given a dataset of N samples, how should we link the samples to build a graph? The space to explore is daunting with 2^(N2) choices but two interesting subfamilies are tractable: matchings and b-matchings. We place distributions over these families and recover the optimal graph or perform Bayesian inference over graphs efficiently using belief propagation algorithms. Higher order distributions over matchings can also be handled efficiently via fast Fourier algorithms. Applications are shown in tracking, network reconstruction, classification, and clustering.


Tony Jebara is Associate Professor of Computer Science at Columbia University and director of the Columbia Machine Learning Laboratory. His research intersects computer science and statistics to develop new frameworks for learning from data with applications in vision, networks, spatio-temporal data, and text. Jebara is also co-founder and head of the advisory board at Sense Networks. He has published over 50 peer-reviewed papers in conferences and journals including NIPS, ICML, UAI, COLT, JMLR, CVPR, ICCV, and AISTAT. He is the author of the book Machine Learning: Discriminative and Generative (Kluwer). Jebara is the recipient of the Career award from the National Science Foundation and has also received honors for his papers from the International Conference on Machine Learning and from the Pattern Recognition Society. He has served as chair and program committee member for many learning conferences. Jebara's research has been featured on television (ABC, BBC, New York One, TechTV, etc.) as well as in the popular press (New York Times, Slash Dot, Wired, Scientific American, Newsweek, etc.). He obtained his PhD in 2002 from MIT. Jebara's lab is supported in part by the Central Intelligence Agency, the National Science Foundation, the Office of Naval Research, the National Security Agency, and Microsoft.

Tamara Kolda - Sandia National Laboratories

CPOPT: Optimization for fitting CANDECOMP/PARAFAC models
October 30, 2008 10:00 am - 10:50 am

Joint work with Evrim Acar, and Daniel M. Dunlavy (Sandia National Laboratories).

Tensor decompositions (e.g., higher-order analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such as chemometrics, signal processing, and web analysis. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimization-based algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and Gauss-Newton approaches.

Dan Kushnir - Yale University

Fast multiscale clustering and manifold identification (poster)
October 27, 2008 6:30 pm - 8:00 pm

We present a novel multiscale clustering algorithm inspired by algebraic multigrid techniques. Our method begins with assembling data points according to local similarities. It uses an aggregation process to obtain reliable scale-dependent global properties, which arise from the local similarities. As the aggregation process proceeds, these global properties affect the formation of coherent clusters. The global features that can be utilized are for example density, shape, intrinsic dimensionality and orientation. The last three features are a part of the manifold identification process which is performed in parallel to the clustering process. The algorithm detects clusters that are distinguished by their multiscale nature, separates between clusters with different densities, and identifies and resolves intersections between clusters. The algorithm is tested on synthetic and real datasets, its running time complexity is linear in the size of the dataset.

Gilad Lerman - University of Minnesota, Twin Cities

Multi-manifold data modeling via spectral curvature clustering
October 29, 2008 10:55 am - 11:45 am

We propose a fast multi-way spectral clustering algorithm for multi-manifold data modeling. We describe the supporting theory as well as the practical choices guided by it. We emphasize the case of hybrid linear modeling, i.e., when the manifolds are affine subspaces in a Euclidean space, and then extend this setting to more general manifolds and other embedding metric spaces. We exemplify applications of the algorithm to several real-world problems while comparing it with other methods.

Yi Ma - University of Illinois at Urbana-Champaign

Dense error correction via L1 minimization
October 29, 2008 3:55 pm - 4:45 pm

It is know that face images of different people lie on multiple low-dimensional subspaces. In this talk, we will show that these subspaces are tightly bundled together as a "bouquet". Precisely due to this unique structure, it allows extremely robust reconstruction and recognition of faces despite severe corruption or occlusion. We will show that if the image resolution and the size of the face database grow in proportion to infinity, computer can correctly and efficiently recover or recognize a face image with almost 100% of its pixels randomly and arbitrarily corrupted, a truly magic ability of L1-minimization.

This is joint work with John Wright of UIUC.

Mauro Maggioni - Duke University

Harmonic and multiscale analysis on low-dimensional data sets in high-dimensions
October 27, 2008 3:55 pm - 4:45 pm

We discuss recent advances in harmonic analysis ideas and algorithms for analyzing data sets in high-dimensional spaces which are assumed to have lower-dimensional geometric structure. They enable the analysis of both the geometry of the data and of functions on the data, and they can be broadly subdivided into local, global and multiscale techniques, roughly corresponding to PDE techniques, Fourier and wavelet analysis ideas in low-dimensional Euclidean signal processing. We discuss applications to machine learning tasks, image processing, and discuss current research directions.

Tristan Nguyen - Office of Naval Research

Large group discussion on:

1. The representation of high-level information and low-level data

2. The symbiotic linkage between information and data

3. The need to transform qualitative information into quantitative data sets and vice versa

4. The need to think beyond the learning for classification.

5. How mathematics can be useful to the aforementioned domains of interest in conjunction with information integration and data fusion.
October 28, 2008 5:15 pm - 6:15 pm

Partha Niyogi - University of Chicago

A Geometric perspective on machine Learning
October 27, 2008 12:15 pm - 1:05 pm

Increasingly, we face machine learning problems in very high dimensional spaces. We proceed with the intuition that although natural data lives in very high dimensions, they have relatively few degrees of freedom. One way to formalize this intuition is to model the data as lying on or near a low dimensional manifold embedded in the high dimensional space. This point of view leads to a new class of algorithms that are "manifold motivated" and a new set of theoretical questions that surround their analysis. A central construction in these algorithms is a graph or simplicial complex that is data-derived and we will relate the geometry of these to the geometry of the underlying manifold. Applications to data analysis, machine learning, and numerical computation will be considered.


Large group discussion on What have we learned about manifold learning — what are its implications for machine learning and numerical analysis? What are open questions? What are successes? Where should we be optimistic and where should we be pessimistic?
October 27, 2008 5:30 pm - 6:30 pm

Irina Rish - IBM

A supervised dimensionality reduction framework for exponential-family variables (poster)
October 27, 2008 6:30 pm - 8:00 pm

Dimensionality reduction (DR) is a popular data-processing technique that serves the following two main purposes: it helps to provide a meaningful interpretation and visualization of the data, and it also helps to prevent overfitting when the number of dimensions greatly exceeds the number of samples, thus working as a form of regularization. However, dimensionality reduction should be driven by a particular data analysis goal. When our goal is prediction rather than an (unsupervised) exploratory data analysis, supervised dimensionality reduction (SDR) that combines DR with learning a predictor can significantly outperform a simple combination of unsupervised DR with a subsequent learning of a predictor on the resulting low-dimensional representation. The problem of supervised dimensionality reduction can be also viewed as finding a predictive low-dimensional representation, which captures the information about the class label contained in the high-dimensional feature vector while ignoring the high-dimenional noise.

We propose a general SDR framework that views both features and class labels as exponential-family random variables (PCA-like dimensionality reduction is included as a particular case of Gaussian data). We learn data- and class-appropriate generalized linear models (GLMs), thus handling both classification and regression, with both discrete and real-valued data. Our SDR-GLM optimization problem can be viewed as discriminative learning based on minimization of conditional probability of class given thehidden variables, while using as a regularizer the conditional probability of the features given the low-dimensional (hidden-variable) representation.

The main advantage of our approach, besides being more general, is using simple closed-form update rules when performing its alternate minimization procedure. This method yields a short Matlab code, fast performance, and is guaranteed to converge. The convergence property, as well as closed form update rules, result from using appropriate auxiliary functions bounding each part of the objective function (i.e., reconstruction and prediction losses). We exploit the additive property of auxiliary functions in order to combine bounds on multiple loss functions.

Promising empirical results are demonstrated on a variety of high-dimensional datasets. Particularly, our results on simulated datasets convincingly demonstrate that SDR-GLM approach can discover underlying low-dimensional structure in high-dimensional noisy data, while outperforming SVM and SVDM, often by far, and practically always beating the unsupervised DR followed by learning a predictor. On real-life datasets, SDR approaches outperfrom the unsupervised DR by far, while matching or sometimes outperforming SVM and SVDM.

Berkant Savas - Linköping University

Tensor approximation - structure and methods (poster)
October 27, 2008 6:30 pm - 8:00 pm

We consider the tensor approximation problem. Given a tensor we want to approximate it by another tensor of lower multilinear rank. It turns out this problem is defined on a product of Grassmann manifolds. We describe the structure of various parts the approximation problem and present convergence plots for Newton and quasi-Newton methods (with BFGS and Limited memory BFGS updates) that solve the problem. All algorithms are applicable to both general and symmetric tensors and incorporate the Grassmann manifold structure of the problem. The benefits of these algorithms compared to traditional alternating least squares approaches are higher convergence rates and the existence of rigorous theory establishing convergence to a stationary point.

This is joint work with Lars Eldén (Linköping University) and Lek-Heng Lim (University of California, Berkeley).

Yoel Shkolnisky - Yale University
Amit Singer - Princeton University

High order consistency relations for classification and de-noising of Cryo-EM images (poster)
October 27, 2008 6:30 pm - 8:00 pm

In order for biologists to exploit the full potential embodied in the Cryo-EM method, two major challenges must be overcome. The first challenge is the extremely low signal-to-noise ratio of the projection images. Improving the signal-to-noise by averaging same-view projections is an essential preprocessing step for all algorithms. The second challenge is the heterogeneity problem, in which the observed projections belong to two or more different molecules or different conformations. Traditional methods assume identical particles, therefore failing to distinguish the different particle species. This results in an inaccurate reconstruction of the molecular structure.

For the first problem, we present two different high order consistency relations between triplets of images. The inclusion of such high order information leads to improvement in the classification and the de-noising of the noisy images compared to existing methods that use only pairwise similarities. We also present a generalization of Laplacian eigenmaps to utilize such higher order affinities in a data set. This generalization is different from current tensor decomposition methods. For the second challenge, we describe a spectral method to establish two or more ab initio reconstructions from a single set of images.

Joint work with Ronald Coifman and Fred Sigworth.


Structure determination of proteins using cryo-electron microscopy (poster)
October 27, 2008 6:30 pm - 8:00 pm

The goal in Cryo-EM structure determination is to reconstruct 3D macromolecular structures from their noisy projections taken at unknown random orientations by an electron microscope. Resolving the Cryo-EM problem is of great scientific importance, as the method is applicable to essentially all macromolecules, as opposed to other existing methods such as crystallography. Since almost all large proteins have not yet been crystallized for 3D X-ray crystallography, Cryo-EM seems the most promising alternative, once its associated mathematical challenges are solved. We present an extremely efficient and robust solver for the Cryo-EM problem that successfully recovers the projection angles in a globally consistent manner. The simple algorithm combines ideas and principles from spectral graph theory, nonlinear dimensionality reduction, geometry and computed tomography. The heart of the algorithm is a unique construction of a sparse graph followed by a fast computation of its eigenvectors.

Joint work with Ronald Coifman and Fred Sigworth.

Arthur Szlam - University of California, Los Angeles

k-planes for classification (poster)
October 27, 2008 6:30 pm - 8:00 pm

The k-planes method is the generalization of k-means where the representatives of each cluster are affine linear sets. We will describe some possible modifications of this method for discriminative learning problems.

M. Alex O. Vasilescu - The State University of New York

Multilinear (tensor) manifold data modeling
October 28, 2008 10:00 am - 10:50 am

Most observable data such as images, videos, human motion capture data, and speech are the result of multiple factors (hidden variables) that are not directly measurable, but which are of interest in data analysis. In the context of computer vision and graphics, we deal with natural images, which are the consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra offers a potent mathematical framework for extracting and explicitly representing the multifactor structure of image datasets.

I will present two multilinear models for learning (nonlinear) manifold representations of image ensembles in which the multiple constituent factors (or modes) are disentangled and analyzed explicitly. Our nonlinear models are computed via a tensor decomposition, known as the M-mode SVD, which is an extension to tensors of the conventional matrix singular value decomposition (SVD), or through a generalization of conventional (linear) ICA called Multilinear Independent Components Analysis (MICA).

I will demonstrate the potency of our novel statistical learning approach in the context of facial image biometrics, where the relevant factors include different facial geometries, expressions, lighting conditions, and viewpoints. When applied to the difficult problem of automated face recognition, our multilinear representations, called TensorFaces (M-mode PCA) and Independent TensorFaces (MICA), yields significantly improved recognition rates relative to the standard PCA and ICA approaches. Recognition is achieved with a novel Multilinear Projection Operator.

Bio: M. Alex O. Vasilescu is an Assistant Professor of Computer Science at Stony Brook University (SUNY). She received her education at MIT and the University of Toronto. She was a research scientist at the MIT Media Lab from 2005-07 and at New York University's Courant Insitute from 2001-05. She has also done research at IBM, Intel, Compaq, and Schlumberger corporations, and at the MIT Artificial Intelligence Lab. She has published papers in computer vision and computer graphics, particularly in the areas of face recognition, human motion analysis/synthesis, image-based rendering, and physics-based modeling (deformable models). She has given several invited talks about her work and has several patents pending. Her face recognition research, known as TensorFaces, has been funded by the TSWG, the Department of Defense's Combating Terrorism Support Program. She was named by MIT's Technology Review Magazine to their 2003 TR100 List of Top 100 Young Innovators. http://www.cs.sunysb.edu/~maov

René Vidal - Johns Hopkins University

Clustering linear and nonlinear manifolds
October 28, 2008 12:15 pm - 1:05 pm

Over the past few years, various techniques have been developed for learning a low-dimensional representation of data lying in a nonlinear manifold embedded in a high-dimensional space. Unfortunately, most of these techniques are limited to the analysis of a single submanifold of a Euclidean space and suffer from degeneracies when applied to linear manifolds (subspaces). The simultaneous segmentation and estimation of a collection of submanifolds from sample data points is a challenging problem that is often thought of as "chicken-and-egg". Therefore, this problem is usually solved in two stages (1) data clustering and (2) model fitting, or else iteratively using, e.g. the Expectation Maximization (EM) algorithm.

The first part of this talk will show that for a wide class of segmentation problems (mixtures of subspaces, mixtures of fundamental matrices/trifocal tensors, mixtures of linear dynamical models), the "chicken-and-egg" dilemma can be tackled using an algebraic geometric technique called Generalized Principal Component Analysis (GPCA). In fact, it is possible to eliminate the data segmentation step algebraically and then use all the data to recover all the models without first segmenting the data. The solution can be obtained using linear algebraic techniques, and is a natural extension of classical PCA from one to multiple subspaces.

The second part of this talk will present a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold, e.g. the space of probability density functions. The algorithm, called Locally Linear Manifold Clustering (LLMC) is based on clustering a low-dimensional representation of the data learned using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces.

The talk will also present a few motivating applications of manifold clustering to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI.


Large group discussion on Manifold Clustering
1) What have have been recent advances on manifold clustering?
a) Algebraic approaches
b) Spectral approaches
c) Probabilistic approaches

2) What have been successful applications of manifold clustering?

3) What is the role of topology, geometry, and statistics, in manifold learning, i.e.,
a) clustering based on the dimensions of the manifolds
b) clustering based on geometry
c) clustering based on statistics

3) What are the open problems in manifold clustering?
October 29, 2008 5:15 pm - 6:15 pm

Michael Wakin - Colorado School of Mines

Manifold models for single- and multi-signal recovery (poster)
October 27, 2008 6:30 pm - 8:00 pm

The emerging theory of Compressive Sensing states that a signal obeying a sparse model can be reconstructed from a small number of random linear measurements. In this poster, we will explore manifold-based models as a generalization of sparse representations, and we will discuss the theory and applications for use of these models in single- and multi-signal recovery from compressive measurements.

Brenton Walker - Laborartory For Telecommunications Sciences

Using persistent homology to recover spatial information from encounter traces (poster)
October 27, 2008 6:30 pm - 8:00 pm

In order to better understand human and animal mobility and its potential effcts on Mobile Ad-Hoc networks and Delay-Tolerant Networks, many researchers have conducted experiments which collect encounter data. Most analyses of these data have focused on isolated statistical properties such as the distribution of node inter-encounter times and the degree distribution of the connectivity graph.

On the other hand, new developments in computational topology, in particular persistent homology, have made it possible to compute topological invariants from noisy data. These homological methods provide a natural way to draw conclusions about global structure based on collections of local information.

We use persistent homology techniques to show that in some cases encounter traces can be used to deduce information about the topology of the physical space the experiment was conducted in, and detect certain changes in the space. We also show that one can distinguish between simulated encounter traces generated on a bounded rectangular grid from traces generated on a grid with the opposite edges wrapped (a toroidal grid). Finally, we have found that nontrivial topological features also appear in real experimental encounter traces.

John Wright - University of Illinois at Urbana-Champaign

Mixed data segmentation via lossy data compression (poster)
October 27, 2008 6:30 pm - 8:00 pm

We consider the problem of clustering mixed data drawn sampled from distributions of (possibly) varying intrinsic dimensionality, e.g., degenerate Gaussians or linear subspaces. We approach this problem from the perspective of lossy data compression, seeking a segmentation that minimizes the number of bits needed to code the data up to a prespecified distortion. The coding length is minimized via an agglomerative algorithm that merges the pair of groups such that the resulting coding length is minimal. This simple algorithm converges globally for a wide range of simulated examples. It also produces state-of-the-art experimental results on applications in image segmentation from texture features and motion segmentation from tracked point features.

Qiu Wu - University of Texas, Austin

Orthant-wise gradient projection method for sparse reconstruction (poster)
October 27, 2008 6:30 pm - 8:00 pm

Many problems in signal processing and statistics involve in finding sparse solutions by solving the l1 regularized least square problem.

The orthant-wise method exploits that fact that the l1 term is a linear function over any given orthant. Hence, the objective function is differentiable orthant-wise.

We implement two variants of the orthant-wise gradient projection method: one is based on steepest-descent search direction and the other one is based a quasi-Newton search direction. Experimental results with compressive sampling problems demonstrate that the performance of our method compares favorably with that of alternative methods in literatures.

Allen Yang - University of California, Berkeley

High-dimensional multi-model estimation – its Algebra, statistics, and sparse representation (poster)
October 27, 2008 6:30 pm - 8:00 pm

Recent advances in information technologies have led to unprecedented large amounts of high-dimensional data from many emerging applications. The need for more advanced techniques to analyze such complex data calls for shifting research paradigms. In this presentation, I will overview and highlight several results in the area of estimation of mixture models in high-dimensional data spaces. Applications will be presented in problems such as motion segmentation, image segmentation, face recognition, and human action categorization. Through this presentation, I intend to emphasize the confluence of algebra and statistics that may lead to more advanced solutions in analyzing complex singular data structures such as mixture linear subspaces and nonlinear manifolds.

Xiaojin (Jerry) Zhu - University of Wisconsin, Madison

Semi-supervised learning by multi-manifold separation
October 30, 2008 10:55 am - 11:45 am

Semi-supervised learning on a single manifold has been the subject of intense study. We consider the setting of multiple manifolds, in which it is assumed that the target function is smooth within each manifold, yet the manifolds can intersect and partly overlap. We discuss our recent work to separate these manifolds from unlabeled data, and perform a 'mild' form of semi-supervised learning which is hopefully robust to the model assumption.