Main navigation | Main content

HOME » SCIENTIFIC RESOURCES » Volumes

Abstracts and Talk Materials

October 27, 2008 6:30 pm - 8:00 pm

Linear and affine subspaces are commonly used to describe the
appearance of objects under different lighting, viewpoint,
articulation, and even identity. A natural problem arising from their
use is -- given a query image portion represented as a point in some
high dimensional space -- find a subspace near to the query. This talk
presents an efficient solution to the approximate nearest subspace
problem for both linear and affine subspaces. Our method is based on a
simple reduction to the problem of nearest point search, and can thus
employ tree-based search or locality-sensitive hashing to find a near
subspace. Further performance improvement is achieved by using random
projections to lower the dimensionality of the problem. We provide
theoretical proofs of correctness and error bounds of our construction,
and demonstrate its capabilities on synthetic and real data. Our
experiments demonstrate that an approximate nearest subspace can be
located significantly faster than the exact nearest subspace, while at
the same time it can find better matches compared to a similar search
on points, in the presence of variations due to viewpoint, lighting,
and so forth.

It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This work proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple class-decision functions. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. This is a joint work with F. Bach (INRIA), J. Ponce (Ecole Normale Supérieure), G. Sapiro (University of Minnesota) and A. Zisserman (Oxford University).

October 27, 2008 6:30 pm - 8:00 pm

We introduce a new framework for collaborative inference and
efficient fusion of manifold-modeled data. We formulate the
notion of a joint manifold model for signal ensembles, and
using this framework we demonstrate the superior performance of
joint processing techniques for a range of tasks including
detection, classification, parameter estimation, and manifold
learning. We then exploit recent results concerning random
projections of low-dimensional manifolds to develop an
efficient framework for distributed data fusion.
As an example, we extend the smashed filter – a maximum
likelihood, model-based estimation and classification algorithm
that exploits random measurements – to a distributed setting.
Bounds for the robustness of this scheme against
measurement noise are derived. We demonstrate the utility of
our framework in a variety of settings, including large scale
camera networks, networks of acoustic sensors, and multi-modal
sensors.

This is joint work with Richard Baraniuk, Marco Duarte, and Chinmay Hegde.

This is joint work with Richard Baraniuk, Marco Duarte, and Chinmay Hegde.

October 27, 2008 6:30 pm - 8:00 pm

Spurred by recent progress in compressive sampling methods, we develop a new reconstruction algorithm for the Fourier problem of recovering from noisy samples a linear combination of unknown frequencies embedded in a very large dimensional ambient space. The approach differs from both L1-norm minimization and orthogonal matching pursuit (and its variants) in that no basis for the ambient space is chosen a priori. The result is improved computational complexity. We provide numerical examples that demonstrate the method's robustness and efficiency.

Joint work with Yu Chen.

Joint work with Yu Chen.

October 27, 2008 3:00 pm - 3:50 pm

Joint work with Mark Davenport, Marco Duarte, Chinmay Hegde,
and Michael Wakin.

Compressive sensing is a new approach to data acquisition in which sparse or compressible signals are digitized for processing not via uniform sampling but via measurements using more general, even random, test functions. In contrast with conventional wisdom, the new theory asserts that one can combine "low-rate sampling" (dimensionality reduction) with an optimization-based reconstruction process for efficient and stable signal acquisition. While the growing compressive sensing literature has focused on sparse or compressible signal models, in this talk, we will explore the use of manifold signal models. We will show that for signals that lie on a smooth manifold, the number of measurements required for a stable representation is proportional to the dimensionality of the manifold, and only logarithmic in the ambient dimension — just as for sparse signals. As an application, we consider learning and inference from manifold-modeled data, such as detecting tumors in medical images, classifying the type of vehicle in airborne surveillance, or estimating the trajectory of an object in a video sequence. Specific techniques we will overview include compressive approaches to the matched filter (dubbed the "smashed filter"), intrinsic dimension estimation for point clouds, and manifold learning algorithms. We will also present a new approach based on the joint articulation manifold (JAM) for compressive distributed learning, estimation, and classification.

Compressive sensing is a new approach to data acquisition in which sparse or compressible signals are digitized for processing not via uniform sampling but via measurements using more general, even random, test functions. In contrast with conventional wisdom, the new theory asserts that one can combine "low-rate sampling" (dimensionality reduction) with an optimization-based reconstruction process for efficient and stable signal acquisition. While the growing compressive sensing literature has focused on sparse or compressible signal models, in this talk, we will explore the use of manifold signal models. We will show that for signals that lie on a smooth manifold, the number of measurements required for a stable representation is proportional to the dimensionality of the manifold, and only logarithmic in the ambient dimension — just as for sparse signals. As an application, we consider learning and inference from manifold-modeled data, such as detecting tumors in medical images, classifying the type of vehicle in airborne surveillance, or estimating the trajectory of an object in a video sequence. Specific techniques we will overview include compressive approaches to the matched filter (dubbed the "smashed filter"), intrinsic dimension estimation for point clouds, and manifold learning algorithms. We will also present a new approach based on the joint articulation manifold (JAM) for compressive distributed learning, estimation, and classification.

In recent years a variety of spectral and geometry-based methods have become popular for various tasks of machine learning,
such as dimensionality reduction, clustering and semi-supervised learning. These methods use a model of data as a probability distribution on a manifold, or,
more generally a mixture of manifolds. In the talk I will discuss some of these methods and recent theoretical results on their convergence.
I will also talk about how spectral methods can be used to learn mixtures of Gaussian distributions, which may be considered the simplest case
of multi-manifold data modeling.

The nature and quantity of data arising out of scientific applications requires novel methods, both for exploratory analysis as well as analysis of significance and validation. One set of new methods relies on ideas and methods from topology. The study of data sets requires an extension of standard methods, which we refer to as persistent topology. We will survey this area, and show that algebraic methods can be applied both to the exploratory and the validation side of investigations, and show some examples.

October 27, 2008 6:30 pm - 8:00 pm

(Joint work with L. Guibas, S. Oudot and P. Skraba - to appear in proc.
SODA'09).

Given a real-valued function f defined over some metric space X, is it possible to recover some structural information about f from the sole information of its values at a finite subset L of sample points, whose pairwise distances in X are given? We provide a positive answer to this question. More precisely, taking advantage of recent advances on the front of stability for persistence diagrams, we introduce a novel algebraic construction, based on a pair of nested families of simplicial complexes built on top of the point cloud L, from which the persistence diagram of f can be faithfully approximated. We derive from this construction a series of algorithms for the analysis of scalar fields from point cloud data. These algorithms are simple and easy to implement, have reasonable complexities, and come with theoretical guarantees. To illustrate the generality and practicality of the approach, we also present experimental results obtained in various applications, ranging from clustering to sensor networks.

Given a real-valued function f defined over some metric space X, is it possible to recover some structural information about f from the sole information of its values at a finite subset L of sample points, whose pairwise distances in X are given? We provide a positive answer to this question. More precisely, taking advantage of recent advances on the front of stability for persistence diagrams, we introduce a novel algebraic construction, based on a pair of nested families of simplicial complexes built on top of the point cloud L, from which the persistence diagram of f can be faithfully approximated. We derive from this construction a series of algorithms for the analysis of scalar fields from point cloud data. These algorithms are simple and easy to implement, have reasonable complexities, and come with theoretical guarantees. To illustrate the generality and practicality of the approach, we also present experimental results obtained in various applications, ranging from clustering to sensor networks.

October 28, 2008 3:00 pm - 3:50 pm

Joint work with Peter Binev, Ron DeVore,
and Philipp Lamby.
This talk addresses the recovery of functions of a large number of variables from point clouds in the context of supervised learning. Our estimator is based on two conceptional pillars.
First, the notion of sparse occupancy
trees is shown to warrant efficient computations even for a very large number of variables. Second, a properly adjusted adaptive tree-approximation scheme is shown to ensure instance optimal performance.
By this we mean the rapid decay (with increasing sample size) of
the probability that the estimator deviates from
the regression function (in a certain natural norm) by more than
the error of best n-term approximation in the sparse tree setting.

October 29, 2008 8:00 pm - 9:15 pm

Wavelets are used in the analysis of sounds and images,
as well as in many other applications. The wavelet transform provides a
mathematical analog to a music score: just as the score tells a musician
which notes to play when, the wavelet analysis of a sound takes things
apart into elementary units with a well defined frequency (which note?)
and at a well defined time (when?). For images wavelets allow you to
first describe the coarse features with a broad brush, and then later to
fill in details. This is similar to zooming in with a camera: first you
can see that the scene is one of shrubs in a garden, then you
concentrate on one shrub and see that it bears berries, then, by zooming
in on one branch, you find that this is a raspberry bush. Because
wavelets allow you to do a similar thing in more mathematical terms, the
wavelet transform is sometimes called a "mathematical microscope."

Wavelets are used by many scientists for many different applications. Outside science as well, wavelets are finding their uses: wavelet transforms are an intergral part of the image compression standard JPEG2000.

The talk will start by explaining the basic principles of wavelets, which are very simple. Then they will be illustrated with some examples, including an explanation of image compression.

Wavelets are used by many scientists for many different applications. Outside science as well, wavelets are finding their uses: wavelet transforms are an intergral part of the image compression standard JPEG2000.

The talk will start by explaining the basic principles of wavelets, which are very simple. Then they will be illustrated with some examples, including an explanation of image compression.

October 30, 2008 12:15 pm - 1:05 pm

In Analog-to-Digital conversion, continuously varying functions (e.g.
the output of a microphone) are transformed into digital sequences from
which one then hopes to be able to reconstruct a close approximation to
the original function. The functions under consideration are typically
band-limited (i.e. their Fourier transform is zero for frequencies
higher than some given value, called the bandwidth); such functions
are completely determined by samples taken at a rate determined
by their bandwidth. These samples then have to be approximated by
a finite binary representation. Surprisingly, in many practical applications
one does not just replace each sample by a truncated binary expansion;
for various technical reasons, it is more attractive to sample much more
often and to replace each sample by just 1 or -1, chosen judicously.

In this talk, we shall see what the attractions are of this quantization scheme, and discuss several interesting mathematical questions suggested by this approach. This will be a review of work by many others as well as myself. It is also a case study of how continuous interaction with engineers helped to shape and change the problems as we tried to make them more precise.

In this talk, we shall see what the attractions are of this quantization scheme, and discuss several interesting mathematical questions suggested by this approach. This will be a review of work by many others as well as myself. It is also a case study of how continuous interaction with engineers helped to shape and change the problems as we tried to make them more precise.

October 28, 2008 10:55 am - 11:45 am

We assume that we are in $R^N$ with $N$ large. The first problem we consider is that there is a function $f$ defined on $Omega:=[0,1]^N$ which is a function of just $k$ of the coordinate variables: $f(x_1,dots,x_N)=g(x_{j_1},dots,x_{j_k})$ where $j_1,dots,j_k$ are not known to us. We want to approximate $f$ from some of its point values. We first assume that we are allowed to choose a collection of points in $Omega$ and ask for the values of $f$ at these points. We are interested in what error we can achieve in terms of the number of points when we assume some smoothness of $g$ in the form of Lipschitz or higher smoothness conditions.

We shall consider two settings: adaptive and non-adaptive. In the adaptive setting, we are allowed to ask for a value of $f$ and then on the basis of the answer we get we can ask for another value. In the non-adaptive setting, we must prescribe the $m$ points in advance.

A second problem we shall consider is when $f$ is not necessarily only a function of $k$ variables but it can be approximated to some tolerance $epsilon$ by such a function. We seek again sets of points where the knowledge of the values of $f$ at such points will allow us to approximate $f$ well.

Our main consideration is to derive results which are not severely effected by the size of $N$, i.e. are not victim of the curse of dimensionality. We shall see that this is possible.

We shall consider two settings: adaptive and non-adaptive. In the adaptive setting, we are allowed to ask for a value of $f$ and then on the basis of the answer we get we can ask for another value. In the non-adaptive setting, we must prescribe the $m$ points in advance.

A second problem we shall consider is when $f$ is not necessarily only a function of $k$ variables but it can be approximated to some tolerance $epsilon$ by such a function. We seek again sets of points where the knowledge of the values of $f$ at such points will allow us to approximate $f$ well.

Our main consideration is to derive results which are not severely effected by the size of $N$, i.e. are not victim of the curse of dimensionality. We shall see that this is possible.

October 27, 2008 6:30 pm - 8:00 pm

We propose a framework for exploiting the same measurement techniques used in emerging compressive sensing systems in the new setting of compressive classification. The first part of the framework maps traditional maximum likelihood hypothesis testing into the compressive domain; we find that the number of measurements required for a given classification performance level does not depend on the sparsity or compressibility of the signal but only on the noise level. The second part of the framework applies the generalized maximum likelihood method to deal with unknown transformations such as the translation, scale, or viewing angle of a target object. Such a set of transformed signals forms a low-dimensional, nonlinear manifold in the high-dimensional image space. We exploit recent results that show that random projections of a smooth manifold result in a stable embedding of the manifold in the lower-dimensional space. Non-differential manifolds, prevalent in imaging applications, are handled through the use of multiscale random projections that perform implicit regularization. We find that the number of measurements required for a given classification performance level grows linearly in the dimensionality of the manifold but only logarithmically in the number of pixels/samples and image classes.

This is joint work with Mark Davenport, Michael Wakin and Richard Baraniuk.

This is joint work with Mark Davenport, Michael Wakin and Richard Baraniuk.

The problem of computing the best multilinear low-rank approximation of a tensor can be formulated as an opimization problem on a product of Grassmann manifolds (by multilinear low-rank approximation we understand an approximation in the sense of the Tucker model). In the Grassmann approach we want to find (bases of) subspaces that represent the low-rank approximation. We have recently derived a Newton algorithm for this problem, where a quadratic model on the tangent space of the manifold is used. From the Grassmann Hessian we derive conditions for a local optimum. We also discuss the sensitivity of the subspaces to perturbations of the tensor elements.

October 27, 2008 6:30 pm - 8:00 pm

We consider the problem of segmenting multiple rigid-body motions in a video sequence from tracked feature point trajectories. Using the affine camera model, this motion segmentation problem can be cast as the problem of segmenting samples drawn from a union of linear subspaces of dimension two, three or four. However, due to limitations of the tracker, occlusions and the presence of nonrigid objects in the scene, the obtained motion trajectories may contain grossly mistracked features, missing entries, or not correspond to any valid motion model.

In this poster, we present a combination of robust subspace separation schemes that can deal with all of these practical issues in a unified framework. For complete uncorrupted trajectories, we examine approaches that try to harness the subspace structure of the data either globally (Generalized PCA) or by minimizing a lossy coding length (Agglomerative Lossy Coding). For incomplete or corrupted trajectories, we develop methods based on PowerFactorization or L1-minimization. The former method fills in missing entries using a linear projection onto a low-dimensional space. The latter method draws strong connections between lossy compression, rank minimization, and sparse representation. We compare our approach to other methods on a database of 167 motion sequences with full motions, independent motions, degenerate motions, partially dependent motions, missing data, outliers, etc. Our results are on par with state-of-the-art results, and in many cases exceed them.

In this poster, we present a combination of robust subspace separation schemes that can deal with all of these practical issues in a unified framework. For complete uncorrupted trajectories, we examine approaches that try to harness the subspace structure of the data either globally (Generalized PCA) or by minimizing a lossy coding length (Agglomerative Lossy Coding). For incomplete or corrupted trajectories, we develop methods based on PowerFactorization or L1-minimization. The former method fills in missing entries using a linear projection onto a low-dimensional space. The latter method draws strong connections between lossy compression, rank minimization, and sparse representation. We compare our approach to other methods on a database of 167 motion sequences with full motions, independent motions, degenerate motions, partially dependent motions, missing data, outliers, etc. Our results are on par with state-of-the-art results, and in many cases exceed them.

The talk explains joint work with Bo'az Klartag, solving the following problem and several variants.
Let f be a real-valued function on an N-point set E in R^{n}. Compute efficiently a function F on the whole R^{n} that agrees with f on E and has C^{m} norm close to least possible.

http://www.ricam.oeaw.ac.at/people/page/fornasier/

October 27, 2008 6:30 pm - 8:00 pm

We present the analysis of a superlinear convergent algorithm for
L1-minimization based on an iterative reweighted least squares. We show
improved performances in compressed sensing.
A similar algorithm is then applied for the efficient solution of a
system of singular PDEs for image recolorization in a relevant real-life
problem of art restoration.

http://cis.jhu.edu/~alvina/

http://www.cis.jhu.edu/~rvidal/

October 27, 2008 6:30 pm - 8:00 pm

Over the past few years, various techniques have been developed for learning a low-dimensional representation of a nonlinear manifold embedded in a high-dimensional space. Unfortunately, most of these techniques are limited to the analysis of a single connected nonlinear submanifold of a Euclidean space and suffer from degeneracies when applied to linear manifolds (subspaces).

This work proposes a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold. First, we learn a representation of the data using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces. Such generalizations exploit geometric properties of the Riemannian space, particularly its Riemannian metric. Then, assuming that the data points from different groups are separated, we show that the null space of a matrix built from the local representation gives the segmentation of the data. However, this method can fail when the data points are drawn from a union of linear manifolds, because M contains additional vectors in its null space. In this case, we propose an alternative method for computing M, which avoids the aforementioned degeneracies, thereby resulting in the correct segmentation. The final result is a simple linear algebraic algorithm for simultaneous nonlinear dimensionality reduction and clustering of data lying in multiple linear and nonlinear manifolds.

We present several applications of our algorithm to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI. Our experiments show that our algorithm performs on par with state-of-the-art techniques that are specifically designed for such segmentation problems.

This work proposes a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold. First, we learn a representation of the data using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces. Such generalizations exploit geometric properties of the Riemannian space, particularly its Riemannian metric. Then, assuming that the data points from different groups are separated, we show that the null space of a matrix built from the local representation gives the segmentation of the data. However, this method can fail when the data points are drawn from a union of linear manifolds, because M contains additional vectors in its null space. In this case, we propose an alternative method for computing M, which avoids the aforementioned degeneracies, thereby resulting in the correct segmentation. The final result is a simple linear algebraic algorithm for simultaneous nonlinear dimensionality reduction and clustering of data lying in multiple linear and nonlinear manifolds.

We present several applications of our algorithm to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI. Our experiments show that our algorithm performs on par with state-of-the-art techniques that are specifically designed for such segmentation problems.

October 27, 2008 10:55 am - 11:45 am

We present a statistical model to learn mixed dimensionalities and densities present in stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The translated Poisson distribution is useful to model a noisy counting process, and it s derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that the sequence of all possible local countings in a point cloud formed by samples of a stratification can be modeled by a mixture of different Translated Poisson distributions, thus allowing the presence of mixed dimensionality and densities in the same data set. A partition of the points in different classes according to both dimensionality and density is obtained, together with an estimation of these quantities for each class.

We propose a novel method for linear dimensionality reduction of manifold-modeled data. First, we show that given only a small number random projections of
sample points in R^N belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy.
Second, we prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections (M) required is linear in K and logarithmic in N, meaning that K < M

October 27, 2008 6:30 pm - 8:00 pm

This poster illustrates an algorithm which was developed to compute implicitly defined manifolds in engineering applications, where the manifold is of low (1-4) dimension, but is embedded in a very high dimensional space (100 and up).

Though the computational details (finding a point and the tangent space of the manifold) are different than in manifold learning, and the embedding is explicit instead of one of the unknowns, there are significant issues in common when computing any manifold.

The representation used closely follows the definition of a manifold. A set of spherical balls (with differing radii)serve as the chart domains, with the embedding mapping limited to the embedded center and the tangent space. In addition, polyhedra are associated with each chart so that overlapping charts correspond to faces of the polyhedra (common to the polyhedra of the overlapping charts). Boundary charts are represented in the same manner, beginning with a polyhedral cone, and adding faces for each overlapping chart.

The polyhedra of interior charts are Laguerre-Voronoi cells, and so it is very easy to locate points on the boundary of a partially represented manifold (if all vertices are closer to the origin than the radius of the ball the chart is completely surrounded by other charts). This provides a basic "continuation" or "extension" algorithm for creating a set of covering charts on a manifold.

In terms of data structures, the atlas of charts is a simple list. The polyhedra are in the same simple list, but also form cell complex whose dual is a Laguerre-Delaunay "triangulation" of the manifold. Interestingly, the fine structure is a cell complex, but so is the gross structure of the manifold. Manifolds with boundary are represented by another cell complex. In this one the faces are manifolds which share boundaries which are the boundary cells of the face.

So far this approach has been applied to three different kinds of manifolds which are common in dynamical systems. I hope to find similar applications in manifold learning.

Though the computational details (finding a point and the tangent space of the manifold) are different than in manifold learning, and the embedding is explicit instead of one of the unknowns, there are significant issues in common when computing any manifold.

The representation used closely follows the definition of a manifold. A set of spherical balls (with differing radii)serve as the chart domains, with the embedding mapping limited to the embedded center and the tangent space. In addition, polyhedra are associated with each chart so that overlapping charts correspond to faces of the polyhedra (common to the polyhedra of the overlapping charts). Boundary charts are represented in the same manner, beginning with a polyhedral cone, and adding faces for each overlapping chart.

The polyhedra of interior charts are Laguerre-Voronoi cells, and so it is very easy to locate points on the boundary of a partially represented manifold (if all vertices are closer to the origin than the radius of the ball the chart is completely surrounded by other charts). This provides a basic "continuation" or "extension" algorithm for creating a set of covering charts on a manifold.

In terms of data structures, the atlas of charts is a simple list. The polyhedra are in the same simple list, but also form cell complex whose dual is a Laguerre-Delaunay "triangulation" of the manifold. Interestingly, the fine structure is a cell complex, but so is the gross structure of the manifold. Manifolds with boundary are represented by another cell complex. In this one the faces are manifolds which share boundaries which are the boundary cells of the face.

So far this approach has been applied to three different kinds of manifolds which are common in dynamical systems. I hope to find similar applications in manifold learning.

Given a graph between N high-dimensional nodes, can we faithfully visualize it in just a few dimensions? We present an algorithm that improves the state-of-the art in dimensionality reduction by extending the Maximum Variance Unfolding method. Visualizations are shown for social networks, species trees, image datasets and human activity.

If we are only given a dataset of N samples, how should we link the samples to build a graph? The space to explore is daunting with 2^(N2) choices but two interesting subfamilies are tractable: matchings and b-matchings. We place distributions over these families and recover the optimal graph or perform Bayesian inference over graphs efficiently using belief propagation algorithms. Higher order distributions over matchings can also be handled efficiently via fast Fourier algorithms. Applications are shown in tracking, network reconstruction, classification, and clustering.

Biography

Tony Jebara is Associate Professor of Computer Science at Columbia University and director of the Columbia Machine Learning Laboratory. His research intersects computer science and statistics to develop new frameworks for learning from data with applications in vision, networks, spatio-temporal data, and text. Jebara is also co-founder and head of the advisory board at Sense Networks. He has published over 50 peer-reviewed papers in conferences and journals including NIPS, ICML, UAI, COLT, JMLR, CVPR, ICCV, and AISTAT. He is the author of the book Machine Learning: Discriminative and Generative (Kluwer). Jebara is the recipient of the Career award from the National Science Foundation and has also received honors for his papers from the International Conference on Machine Learning and from the Pattern Recognition Society. He has served as chair and program committee member for many learning conferences. Jebara's research has been featured on television (ABC, BBC, New York One, TechTV, etc.) as well as in the popular press (New York Times, Slash Dot, Wired, Scientific American, Newsweek, etc.). He obtained his PhD in 2002 from MIT. Jebara's lab is supported in part by the Central Intelligence Agency, the National Science Foundation, the Office of Naval Research, the National Security Agency, and Microsoft.

If we are only given a dataset of N samples, how should we link the samples to build a graph? The space to explore is daunting with 2^(N2) choices but two interesting subfamilies are tractable: matchings and b-matchings. We place distributions over these families and recover the optimal graph or perform Bayesian inference over graphs efficiently using belief propagation algorithms. Higher order distributions over matchings can also be handled efficiently via fast Fourier algorithms. Applications are shown in tracking, network reconstruction, classification, and clustering.

Biography

Tony Jebara is Associate Professor of Computer Science at Columbia University and director of the Columbia Machine Learning Laboratory. His research intersects computer science and statistics to develop new frameworks for learning from data with applications in vision, networks, spatio-temporal data, and text. Jebara is also co-founder and head of the advisory board at Sense Networks. He has published over 50 peer-reviewed papers in conferences and journals including NIPS, ICML, UAI, COLT, JMLR, CVPR, ICCV, and AISTAT. He is the author of the book Machine Learning: Discriminative and Generative (Kluwer). Jebara is the recipient of the Career award from the National Science Foundation and has also received honors for his papers from the International Conference on Machine Learning and from the Pattern Recognition Society. He has served as chair and program committee member for many learning conferences. Jebara's research has been featured on television (ABC, BBC, New York One, TechTV, etc.) as well as in the popular press (New York Times, Slash Dot, Wired, Scientific American, Newsweek, etc.). He obtained his PhD in 2002 from MIT. Jebara's lab is supported in part by the Central Intelligence Agency, the National Science Foundation, the Office of Naval Research, the National Security Agency, and Microsoft.

October 30, 2008 10:00 am - 10:50 am

Joint work with Evrim Acar, and Daniel M. Dunlavy
(Sandia National Laboratories).

Tensor decompositions (e.g., higher-order analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such as chemometrics, signal processing, and web analysis. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimization-based algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and Gauss-Newton approaches.

Tensor decompositions (e.g., higher-order analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such as chemometrics, signal processing, and web analysis. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimization-based algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and Gauss-Newton approaches.

October 27, 2008 6:30 pm - 8:00 pm

We present a novel multiscale clustering algorithm inspired by algebraic
multigrid techniques. Our method begins with assembling data points according
to local similarities. It uses an aggregation process to obtain reliable
scale-dependent global properties, which arise from the local similarities.
As the aggregation process proceeds, these global properties affect the formation
of coherent clusters. The global features that can be utilized are for
example density, shape, intrinsic dimensionality and orientation. The last
three features are a part of the manifold identification process which is performed
in parallel to the clustering process. The algorithm detects clusters
that are distinguished by their multiscale nature, separates between clusters
with different densities, and identifies and resolves intersections between
clusters. The algorithm is tested on synthetic and real datasets, its running
time complexity is linear in the size of the dataset.

October 29, 2008 10:55 am - 11:45 am

We propose a fast multi-way spectral clustering algorithm for multi-manifold data modeling. We describe the supporting theory as well as the practical choices guided by it. We emphasize the case of hybrid linear modeling, i.e., when the manifolds are affine subspaces in a Euclidean space, and then extend this setting to more general manifolds and other embedding metric spaces. We exemplify applications of the algorithm to several real-world problems while comparing it with other methods.

It is know that face images of different people lie on multiple low-dimensional subspaces. In this talk,
we will show that these subspaces are tightly bundled together as a "bouquet". Precisely due to this
unique structure, it allows extremely robust reconstruction and recognition of faces despite severe
corruption or occlusion. We will show that if the image resolution and the size of the face database
grow in proportion to infinity, computer can correctly and efficiently recover or recognize a face image
with almost 100% of its pixels randomly and arbitrarily corrupted, a truly magic ability of L1-minimization.

This is joint work with John Wright of UIUC.

This is joint work with John Wright of UIUC.

October 27, 2008 3:55 pm - 4:45 pm

We discuss recent advances in harmonic analysis ideas and algorithms for analyzing data sets in high-dimensional spaces which
are assumed to have lower-dimensional geometric structure. They enable the analysis of both the geometry of the data and of functions on the data, and they can be broadly subdivided into local, global and multiscale techniques, roughly corresponding to PDE techniques, Fourier and wavelet analysis ideas in low-dimensional Euclidean signal processing. We discuss applications to machine learning tasks, image processing, and discuss current research directions.

1. The representation of high-level information and low-level data

2. The symbiotic linkage between information and data

3. The need to transform qualitative information into quantitative data sets and vice versa

4. The need to think beyond the learning for classification.

**
5. How mathematics can be useful to the aforementioned domains
of interest in conjunction with information integration and
data fusion.
**

October 28, 2008 5:15 pm - 6:15 pm

Increasingly, we face machine learning problems in very high
dimensional spaces. We proceed with the intuition that although
natural data lives in very high dimensions, they have relatively few
degrees of freedom. One way to formalize this intuition is to model
the data as lying on or near a low dimensional manifold embedded in
the high dimensional space. This point of view leads to a new class of
algorithms that are "manifold motivated" and a new set of theoretical
questions that surround their analysis. A central construction in
these algorithms is a graph or simplicial complex that is data-derived
and we will relate the geometry of these to the geometry of the
underlying manifold. Applications to data analysis, machine
learning, and numerical computation will be considered.

October 27, 2008 5:30 pm - 6:30 pm

October 27, 2008 6:30 pm - 8:00 pm

Dimensionality reduction (DR) is a popular data-processing technique that serves the
following two main purposes: it helps to provide a meaningful interpretation and
visualization of the data, and it also helps to prevent overfitting when the number of
dimensions greatly exceeds the number of samples, thus working as a form of regularization.
However, dimensionality reduction should be driven by a particular data analysis goal.
When our goal is prediction rather than an (unsupervised) exploratory data analysis,
supervised dimensionality reduction (SDR) that combines DR with learning a predictor can
significantly outperform a simple combination of unsupervised DR with a subsequent learning
of a predictor on the resulting low-dimensional representation.
The problem of supervised dimensionality reduction can be also viewed as finding
a predictive low-dimensional representation, which captures the information about the class label
contained in the high-dimensional feature vector while ignoring the high-dimenional noise.

We propose a general SDR framework that views both features and class labels as exponential-family random variables (PCA-like dimensionality reduction is included as a particular case of Gaussian data). We learn data- and class-appropriate generalized linear models (GLMs), thus handling both classification and regression, with both discrete and real-valued data. Our SDR-GLM optimization problem can be viewed as discriminative learning based on minimization of conditional probability of class given thehidden variables, while using as a regularizer the conditional probability of the features given the low-dimensional (hidden-variable) representation.

The main advantage of our approach, besides being more general, is using simple closed-form update rules when performing its alternate minimization procedure. This method yields a short Matlab code, fast performance, and is guaranteed to converge. The convergence property, as well as closed form update rules, result from using appropriate auxiliary functions bounding each part of the objective function (i.e., reconstruction and prediction losses). We exploit the additive property of auxiliary functions in order to combine bounds on multiple loss functions.

Promising empirical results are demonstrated on a variety of high-dimensional datasets. Particularly, our results on simulated datasets convincingly demonstrate that SDR-GLM approach can discover underlying low-dimensional structure in high-dimensional noisy data, while outperforming SVM and SVDM, often by far, and practically always beating the unsupervised DR followed by learning a predictor. On real-life datasets, SDR approaches outperfrom the unsupervised DR by far, while matching or sometimes outperforming SVM and SVDM.

We propose a general SDR framework that views both features and class labels as exponential-family random variables (PCA-like dimensionality reduction is included as a particular case of Gaussian data). We learn data- and class-appropriate generalized linear models (GLMs), thus handling both classification and regression, with both discrete and real-valued data. Our SDR-GLM optimization problem can be viewed as discriminative learning based on minimization of conditional probability of class given thehidden variables, while using as a regularizer the conditional probability of the features given the low-dimensional (hidden-variable) representation.

The main advantage of our approach, besides being more general, is using simple closed-form update rules when performing its alternate minimization procedure. This method yields a short Matlab code, fast performance, and is guaranteed to converge. The convergence property, as well as closed form update rules, result from using appropriate auxiliary functions bounding each part of the objective function (i.e., reconstruction and prediction losses). We exploit the additive property of auxiliary functions in order to combine bounds on multiple loss functions.

Promising empirical results are demonstrated on a variety of high-dimensional datasets. Particularly, our results on simulated datasets convincingly demonstrate that SDR-GLM approach can discover underlying low-dimensional structure in high-dimensional noisy data, while outperforming SVM and SVDM, often by far, and practically always beating the unsupervised DR followed by learning a predictor. On real-life datasets, SDR approaches outperfrom the unsupervised DR by far, while matching or sometimes outperforming SVM and SVDM.

October 27, 2008 6:30 pm - 8:00 pm

We consider the tensor approximation problem. Given a tensor we want to approximate it by another tensor of lower multilinear rank. It turns out this problem is defined on a product of Grassmann manifolds. We describe the structure of various parts the approximation problem and present convergence plots for Newton and quasi-Newton methods (with BFGS and Limited memory BFGS updates) that solve the problem. All algorithms are applicable to both general and symmetric tensors and incorporate the Grassmann manifold structure of the problem. The benefits of these algorithms compared to traditional alternating least squares approaches are higher convergence rates and the existence of rigorous theory establishing convergence to a stationary point.

This is joint work with Lars Eldén (Linköping University) and Lek-Heng Lim (University of California, Berkeley).

This is joint work with Lars Eldén (Linköping University) and Lek-Heng Lim (University of California, Berkeley).

http://www.cs.yale.edu/homes/ys264/

http://www.math.princeton.edu/directory/amit-singer

October 27, 2008 6:30 pm - 8:00 pm

In order for biologists to exploit the full potential embodied in the
Cryo-EM method, two major challenges must be overcome. The first challenge
is the extremely low signal-to-noise ratio of the projection images.
Improving the signal-to-noise by averaging same-view projections is an
essential preprocessing step for all algorithms. The second challenge is the
heterogeneity problem, in which the observed projections belong to two or
more different molecules or different conformations. Traditional methods
assume identical particles, therefore failing to distinguish the different
particle species. This results in an inaccurate reconstruction of the
molecular structure.

For the first problem, we present two different high order consistency relations between triplets of images. The inclusion of such high order information leads to improvement in the classification and the de-noising of the noisy images compared to existing methods that use only pairwise similarities. We also present a generalization of Laplacian eigenmaps to utilize such higher order affinities in a data set. This generalization is different from current tensor decomposition methods. For the second challenge, we describe a spectral method to establish two or more ab initio reconstructions from a single set of images.

Joint work with Ronald Coifman and Fred Sigworth.

For the first problem, we present two different high order consistency relations between triplets of images. The inclusion of such high order information leads to improvement in the classification and the de-noising of the noisy images compared to existing methods that use only pairwise similarities. We also present a generalization of Laplacian eigenmaps to utilize such higher order affinities in a data set. This generalization is different from current tensor decomposition methods. For the second challenge, we describe a spectral method to establish two or more ab initio reconstructions from a single set of images.

Joint work with Ronald Coifman and Fred Sigworth.

October 27, 2008 6:30 pm - 8:00 pm

The goal in Cryo-EM structure determination is to reconstruct 3D
macromolecular structures from their noisy projections taken at unknown
random orientations by an electron microscope. Resolving the Cryo-EM problem
is of great scientific importance, as the method is applicable to
essentially all macromolecules, as opposed to other existing methods such as
crystallography. Since almost all large proteins have not yet been
crystallized for 3D X-ray crystallography, Cryo-EM seems the most promising
alternative, once its associated mathematical challenges are solved. We
present an extremely efficient and robust solver for the Cryo-EM problem
that successfully recovers the projection angles in a globally consistent
manner. The simple algorithm combines ideas and principles from spectral
graph theory, nonlinear dimensionality reduction, geometry and computed
tomography. The heart of the algorithm is a unique construction of a sparse
graph followed by a fast computation of its eigenvectors.

Joint work with Ronald Coifman and Fred Sigworth.

Joint work with Ronald Coifman and Fred Sigworth.

October 27, 2008 6:30 pm - 8:00 pm

The k-planes method is the generalization of k-means where the
representatives of each cluster are affine linear sets. We
will describe some possible modifications of this method for
discriminative learning problems.

http://www.cs.sunysb.edu/people/faculty/AlexVasilescu.html

Most observable data such as images, videos, human motion capture
data, and speech are the result of multiple factors (hidden
variables) that are not directly measurable, but which are of
interest in data analysis. In the context of computer vision and
graphics, we deal with natural images, which are the consequence
of multiple factors related to scene structure, illumination, and
imaging. Multilinear algebra offers a potent mathematical
framework for extracting and explicitly representing the
multifactor structure of image datasets.

I will present two multilinear models for learning (nonlinear) manifold representations of image ensembles in which the multiple constituent factors (or modes) are disentangled and analyzed explicitly. Our nonlinear models are computed via a tensor decomposition, known as the M-mode SVD, which is an extension to tensors of the conventional matrix singular value decomposition (SVD), or through a generalization of conventional (linear) ICA called Multilinear Independent Components Analysis (MICA).

I will demonstrate the potency of our novel statistical learning approach in the context of facial image biometrics, where the relevant factors include different facial geometries, expressions, lighting conditions, and viewpoints. When applied to the difficult problem of automated face recognition, our multilinear representations, called TensorFaces (M-mode PCA) and Independent TensorFaces (MICA), yields significantly improved recognition rates relative to the standard PCA and ICA approaches. Recognition is achieved with a novel Multilinear Projection Operator.

Bio: M. Alex O. Vasilescu is an Assistant Professor of Computer Science at Stony Brook University (SUNY). She received her education at MIT and the University of Toronto. She was a research scientist at the MIT Media Lab from 2005-07 and at New York University's Courant Insitute from 2001-05. She has also done research at IBM, Intel, Compaq, and Schlumberger corporations, and at the MIT Artificial Intelligence Lab. She has published papers in computer vision and computer graphics, particularly in the areas of face recognition, human motion analysis/synthesis, image-based rendering, and physics-based modeling (deformable models). She has given several invited talks about her work and has several patents pending. Her face recognition research, known as TensorFaces, has been funded by the TSWG, the Department of Defense's Combating Terrorism Support Program. She was named by MIT's Technology Review Magazine to their 2003 TR100 List of Top 100 Young Innovators. http://www.cs.sunysb.edu/~maov

I will present two multilinear models for learning (nonlinear) manifold representations of image ensembles in which the multiple constituent factors (or modes) are disentangled and analyzed explicitly. Our nonlinear models are computed via a tensor decomposition, known as the M-mode SVD, which is an extension to tensors of the conventional matrix singular value decomposition (SVD), or through a generalization of conventional (linear) ICA called Multilinear Independent Components Analysis (MICA).

I will demonstrate the potency of our novel statistical learning approach in the context of facial image biometrics, where the relevant factors include different facial geometries, expressions, lighting conditions, and viewpoints. When applied to the difficult problem of automated face recognition, our multilinear representations, called TensorFaces (M-mode PCA) and Independent TensorFaces (MICA), yields significantly improved recognition rates relative to the standard PCA and ICA approaches. Recognition is achieved with a novel Multilinear Projection Operator.

Bio: M. Alex O. Vasilescu is an Assistant Professor of Computer Science at Stony Brook University (SUNY). She received her education at MIT and the University of Toronto. She was a research scientist at the MIT Media Lab from 2005-07 and at New York University's Courant Insitute from 2001-05. She has also done research at IBM, Intel, Compaq, and Schlumberger corporations, and at the MIT Artificial Intelligence Lab. She has published papers in computer vision and computer graphics, particularly in the areas of face recognition, human motion analysis/synthesis, image-based rendering, and physics-based modeling (deformable models). She has given several invited talks about her work and has several patents pending. Her face recognition research, known as TensorFaces, has been funded by the TSWG, the Department of Defense's Combating Terrorism Support Program. She was named by MIT's Technology Review Magazine to their 2003 TR100 List of Top 100 Young Innovators. http://www.cs.sunysb.edu/~maov

Over the past few years, various techniques have been developed for learning a low-dimensional representation of data lying in a nonlinear manifold embedded in a high-dimensional space. Unfortunately, most of these techniques are limited to the analysis of a single submanifold of a Euclidean space and suffer from degeneracies when applied to linear manifolds (subspaces). The simultaneous segmentation and estimation of a collection of submanifolds from sample data points is a challenging problem that is often thought of as "chicken-and-egg". Therefore, this problem is usually solved in two stages (1) data clustering and (2) model fitting, or else iteratively using, e.g. the Expectation Maximization (EM) algorithm.

The first part of this talk will show that for a wide class of segmentation problems (mixtures of subspaces, mixtures of fundamental matrices/trifocal tensors, mixtures of linear dynamical models), the "chicken-and-egg" dilemma can be tackled using an algebraic geometric technique called Generalized Principal Component Analysis (GPCA). In fact, it is possible to eliminate the data segmentation step algebraically and then use all the data to recover all the models without first segmenting the data. The solution can be obtained using linear algebraic techniques, and is a natural extension of classical PCA from one to multiple subspaces.

The second part of this talk will present a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold, e.g. the space of probability density functions. The algorithm, called Locally Linear Manifold Clustering (LLMC) is based on clustering a low-dimensional representation of the data learned using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces.

The talk will also present a few motivating applications of manifold clustering to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI.

The first part of this talk will show that for a wide class of segmentation problems (mixtures of subspaces, mixtures of fundamental matrices/trifocal tensors, mixtures of linear dynamical models), the "chicken-and-egg" dilemma can be tackled using an algebraic geometric technique called Generalized Principal Component Analysis (GPCA). In fact, it is possible to eliminate the data segmentation step algebraically and then use all the data to recover all the models without first segmenting the data. The solution can be obtained using linear algebraic techniques, and is a natural extension of classical PCA from one to multiple subspaces.

The second part of this talk will present a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold, e.g. the space of probability density functions. The algorithm, called Locally Linear Manifold Clustering (LLMC) is based on clustering a low-dimensional representation of the data learned using generalizations of local nonlinear dimensionality reduction algorithms from Euclidean to Riemannian spaces.

The talk will also present a few motivating applications of manifold clustering to computer vision problems such as texture clustering, segmentation of rigid body motions, segmentation of dynamic textures, segmentation of diffusion MRI.

1) What have have been recent advances on manifold clustering?

a) Algebraic approaches

b) Spectral approaches

c) Probabilistic approaches

2) What have been successful applications of manifold clustering?

3) What is the role of topology, geometry, and statistics, in
manifold learning, i.e.,

a) clustering based on the dimensions of the manifolds

b) clustering based on geometry

c) clustering based on statistics

**
3) What are the open problems in manifold clustering?
**

October 29, 2008 5:15 pm - 6:15 pm

October 27, 2008 6:30 pm - 8:00 pm

The emerging theory of Compressive Sensing states that a signal obeying a sparse model can be reconstructed from a small number of random linear measurements. In this poster, we will explore manifold-based models as a generalization of sparse representations, and we will discuss the theory and applications for use of these models in single- and multi-signal recovery from compressive measurements.

October 27, 2008 6:30 pm - 8:00 pm

In order to better understand human and animal mobility and its potential effcts on Mobile Ad-Hoc networks and Delay-Tolerant Networks, many researchers have conducted experiments which collect encounter data. Most analyses of these data have focused on isolated statistical properties such as the distribution of node inter-encounter times and the degree distribution of the connectivity graph.

On the other hand, new developments in computational topology, in particular persistent homology, have made it possible to compute topological invariants from noisy data. These homological methods provide a natural way to draw conclusions about global structure based on collections of local information.

We use persistent homology techniques to show that in some cases encounter traces can be used to deduce information about the topology of the physical space the experiment was conducted in, and detect certain changes in the space. We also show that one can distinguish between simulated encounter traces generated on a bounded rectangular grid from traces generated on a grid with the opposite edges wrapped (a toroidal grid). Finally, we have found that nontrivial topological features also appear in real experimental encounter traces.

On the other hand, new developments in computational topology, in particular persistent homology, have made it possible to compute topological invariants from noisy data. These homological methods provide a natural way to draw conclusions about global structure based on collections of local information.

We use persistent homology techniques to show that in some cases encounter traces can be used to deduce information about the topology of the physical space the experiment was conducted in, and detect certain changes in the space. We also show that one can distinguish between simulated encounter traces generated on a bounded rectangular grid from traces generated on a grid with the opposite edges wrapped (a toroidal grid). Finally, we have found that nontrivial topological features also appear in real experimental encounter traces.

October 27, 2008 6:30 pm - 8:00 pm

We consider the problem of clustering mixed data drawn sampled from distributions of (possibly) varying intrinsic dimensionality, e.g., degenerate Gaussians or linear subspaces. We approach this problem from the perspective of lossy data compression, seeking a segmentation that minimizes the number of bits needed to code the data up to a prespecified distortion. The coding length is minimized via an agglomerative algorithm that merges the pair of groups such that the resulting coding length is minimal. This simple algorithm converges globally for a wide range of simulated examples. It also produces state-of-the-art experimental results on applications in image segmentation from texture features and motion segmentation from tracked point features.

October 27, 2008 6:30 pm - 8:00 pm

Many problems in signal processing and statistics involve
in finding sparse solutions by solving the l_{1} regularized least square problem.

The orthant-wise method exploits that fact that the l_{1} term is a linear function over any given orthant. Hence, the objective function is differentiable orthant-wise.

We implement two variants of the orthant-wise gradient projection method: one is based on steepest-descent search direction and the other one is based a quasi-Newton search direction. Experimental results with compressive sampling problems demonstrate that the performance of our method compares favorably with that of alternative methods in literatures.

The orthant-wise method exploits that fact that the l

We implement two variants of the orthant-wise gradient projection method: one is based on steepest-descent search direction and the other one is based a quasi-Newton search direction. Experimental results with compressive sampling problems demonstrate that the performance of our method compares favorably with that of alternative methods in literatures.

October 27, 2008 6:30 pm - 8:00 pm

Recent advances in information technologies have led to unprecedented large amounts of high-dimensional data from many emerging applications. The need for more advanced techniques to analyze such complex data calls for shifting research paradigms. In this presentation, I will overview and highlight several results in the area of estimation of mixture models in high-dimensional data spaces. Applications will be presented in problems such as motion segmentation, image segmentation, face recognition, and human action categorization. Through this presentation, I intend to emphasize the confluence of algebra and statistics that may lead to more advanced solutions in analyzing complex singular data structures such as mixture linear subspaces and nonlinear manifolds.

October 30, 2008 10:55 am - 11:45 am

Semi-supervised learning on a single manifold has been the subject of intense study. We consider the setting of multiple manifolds, in which it is assumed that the target function is smooth within each manifold, yet the manifolds can intersect and partly overlap. We discuss our recent work to separate these manifolds from unlabeled data, and perform a 'mild' form of semi-supervised learning which is hopefully robust to the model assumption.