Data in high dimensions is becoming ubiquitous, from image
analysis and finances to computational biology and neuroscience.
This data is often given or represented as samples embedded in a
high dimensional Euclidean space, point cloud data, though
it is assumed to belong to lower dimensional manifolds. Thus, in
recent years, there have been significant efforts in the
development of methods to analyze these point clouds and their
underlying manifolds. These include numerous techniques for the
estimation of the intrinsic dimension of the data and also its
projection onto lower dimensional representations. These
disciplines are often called manifold learning and
dimensionality reduction.
The vast majority of the techniques developed in the literature assume, either
explicitly or implicitly, that the given point cloud are samples
of a unique manifold. It is very easy to realize that a
significant part of the interesting data has mixed dimensionality
and complexity.
That is, we have samples not of a manifold but of a
stratification.
In these cases it is useful to cluster the data according to
the complexity (dimensionality) of the underlying possible
multiple manifolds (see example in figure above). Such clustering can be used both to better
understand the varying dimensionality and complexity of the data,
e.g., states in neural recordings or different human activities
for video analysis, or as a pre-processing step for some
manifold learning and dimensionality reduction and dimensionality reduction
techniques.
IMA postdoc Gloria Haro together with IMA long term visitors Gregory Randall
and Guillermo Sapiro have proposed a technique for stratification learning.
The method is based on a mixture of Poisson distributions that locally model the
counting process of points in the different manifolds. Their technique
automatically gives a soft clustering of the point cloud according to dimensionality
and density, with an estimation of both quantities for each class.
The figure below illustrate two applications of the technique in computer vision, first to
the recognition of digits, and second to the classification of
activities recorded in a video.
|