Talk
Abstracts:
Material
from Talks
Yali
Amit (The University
of Chicago) amit@galton.uchicago.edu
A Neural Architecture for Learning, Detecting and Recognizing
Objects
I will present a neural architecture based on simple binary
neurons which uses field dependent Hebbian learning to train
object models and classifiers. The models are used to drive
detection, and the classifiers for recognition; all are integrated
into one architecture.
The object models as well as the classifiers are based on a
family of binary local features with hard wired invariance to
contrast changes and local geometric deformations. Recognition
among several object classes is obtained through a vote among
a large number of randomized perceptrons based on these binary
features. When a particular object model is evoked in a central
module, detection in the entire visual scene, and at a range
of poses, is obtained through top-down priming of particular
retinotopic detectors of the features. Some analogies to well
known experiments on object detection and recognition in primates
will be discussed.
This is joint work with Massimo Mascaro.
Peter
N. Belhumeur (Center for Computational Vision and
Control Yale University) belhumeur@yale.edu
Shedding
Light on Illumination
In
this talk we consider a number of interrelated issues in the
appearance and modeling of objects over changes in illumination:
1. We show that show that for an object with Lambertian reflectance
there are no discriminative functions that are invariant to
illumination. 2. We develop a generative method for modeling
objects under variable illumination and pose from a small number
of exemplars. 3. We point out implicit ambiguities and historical
missteps in determining shape from shadows and shading. 4. We
develop an exhaustive phenomenological method for shape reconstruction
and image-based rendering that differs from item 2 above in
that it makes no assumptions about an objects shape or BRDF.
5. We present a low-dimensional, data-driven representation
for BRDFs in an attempt to bridge the gap in the above-mentioned
methods. Throughout the talk we present supporting results and
demonstrations using large databases of objects and faces under
variable illumination
Rama
Chellappa (Department of Electrical and Computer
Engineering and Center for Automation Research, University of
Maryland) rama@cfar.umd.edu
Face Recognition and Verification in Still and Video Images
Face recognition and verification from still and video images
has important applications in surveillance, HCI design and access
control. In this talk we will present our recent work on a number
of problems in this area. Specifically we will discuss the following:
algorithms for for face detection in still images using optimal
shape operators, use of symmetry and shape from shading for
designing algorithms that are robust to changes in illumination
and pose, face detection from video, simultaneous tracking and
verification of humans using sequential importance sampling
techniques, shape-encoded tracking of human heads using branching-particle
methods, gait-based recognition of humans using continuous HMM
models and the Maryland real-time tracking system for tracking
humans.
Ernst D. Dickmanns (Universität
der Bundeswehr München)
Ernst.Dickmanns@UniBw-Muenchen.de
Expectation-based,
Multi-focal, Saccadic (EMS-) Vision.
(A System for Understanding Dynamic Scenes Observed from a Moving
Platform)
Expectation-based,
Multi-focal, Saccadic (EMS-) Vision has been designed to cope
with many different aspects of mission performance for a variety
of vehicles. A wide field of view (f.o.v., > ~100°) nearby
allows to avoid moving obstacles at slow speed and to negotiate
tight curves. Trinocular stereo in a small central f.o.v. yields
good depth estimations in the near range with one single well
recognizable feature. Active gaze control allows to shift this
f.o.v. to where it is needed, and to inertially stabilize the
viewing direction for eliminating motion blur. Ego-motion under
strong perturbat ions is determined by inertial/visual data
fusion taking advantage of spatio-temporal models on differ
ential and integral scales.
Scene
representation is done in a dynamic scene tree exploiting homogeneous
coordinate transformations like in computer graphics; however,
in computer vision many of the entries into the transformation
matrices and the generic object models are the unknowns of the
problem. In the 4 -D approach developed on the basis of the
extended Kalman filtering, these unknowns are determined by
an initial (daring) guess and consecutive recursive improvements
by prediction error feedback exploiting rich first order approximations
(in 3-D space including perspective mapping) through Jacobian
matrices for each object/sensor-pair. The Dynamic Object dataBase
(DOB) [containing the scene tree representation among other
knowledge components about the vehicle status] is the central
layer for separating the 'Systems-Engineering' lower part of
the overall cognitive system from the more 'Artificial Intelligence'-oriented
upper part with state chart representations. On the higher levels,
the situation comprising several objects and the own intentions
(goals) is assessed and behavioral decisions are taken in the
mission context. Here, knowledge about the effects of maneuvers
and of the application of feedback control laws i s available.
Actual maneuver performance and control computations are done
on the lower levels with dedicated processors in the distributed
overall system (about a dozen processors). Experimental results
in fully autonomous road vehicle guidance with the test veh
icles 'VaMoRs' (a 5-ton van, maneuvering on minor road networks)
and `VaMP' (a Mercedes 500 SEL, displaying hybrid adaptive cruise
control on highways) will be shown.
Ian
L. Dryden (School of Mathematical Sciences, University
of Nottingham) ild@maths.nott.ac.uk http://www.maths.nott.ac.uk/personal/ild
Statistical
Shape Analysis in High-Level Vision
Shape
is an essential ingredient of high-level image analysis. The
geometrical description of an object can be separated into two
parts: the registration information and the `shape' (which is
invariant under registration transformations). A common choice
of registration is the group of Euclidean similarity transformations
and the geometrical prosperities that are invariant under this
group of transformations are known as `similarity shape'. In
a Bayesian approach to object recognition shape information
is usually specified as part of the prior distribution. The
prior is then combined with the likelihood, or image model,
leading to posterior inference about the object. The statistical
theory of shape began with the independent work of David Kendall,
Fred Bookstein and Herbert Ziezold in the 1970s. Subsequent
developments have led to a deep differential geometric theory
of shape spaces, as well as practical statistical approaches
to analysing objects using probability distributions of shape
and likelihood based inference. A summary of the field is given
by Dryden and Mardia (1998, Wiley), where the main emphasis
is on the shapes of labeled point set configurations. In the
image analysis literature there are numerous works on the notion
of shape, many of which are directly related to the work in
Kendall's shape spaces. A common feature of the approaches is
some form of shape metric, and many of the shape representations
and metrics in common use are related through approximate affine
transformations of the particular shape coordinates being used.
In the talk I shall discuss some of the main aspects of statistical
shape analysis, making comparisons with alternative approaches,
which are often based on collections of angles or ratios of
distances. Some applications of shape analysis in image analysis
will be described. Finally, one of the major advantages of using
statistical shape analysis is that statistical inference can
be carried out when the images consists a sample of objects,
and we consider an example where it is of interest to test whether
or not two populations have different mean shapes.

Davi
Geiger (Courant Institute, NYU) geiger@cs.nyu.edu
Measuring
the Convexity of Shapes
Many
recognition tasks requires shape understanding. One of the most
important measures of shapes is convexity. Psychophysics experiments
show that the human visual system prefers convexity over symmetry
to select figure from background. While we have today a good
understanding of symmetries of shapes (e.g., skeletons), we
have not yet devoted (much) attention to convexity. In particular,
today shapes can be classified either as convex or not convex(concave),
but the psychophysics experiments refer to shapes that are not
perfectly convex.
We
propose a continuous measure of convexity for shapes. We investigate
a Markov random field model for extracting convexity (we can't
leave home without them, or , there are things that science
can't explain but for everything else there are MRFs.) In our
approach convexity becomes an emergent property from a sum of
local interactions (were each local term does not contain convexity).
We analyse our approach and extensively experiment with it.
This
is work in collaboration with Nava Rubin and Hsing-Kuo Pao.

Donald Geman (University of Massachusetts) geman@cmla.ens-cachan.fr
Coarse-to-Fine
Object Detection
Object recognition is one of the primary goals of high-level
computer vision, especially for real greyscale scenes and with
the speed and precision of human vision. I will talk about a
simpler but still vexing problem: detect and roughly localize
all highly visible instances a small set of generic object classes,
such as faces and cars - or even from only one class, measuring
performance in terms of computation and false alarms. The approach,
motivated by efficient computation, is sequential testing which
is highly coarse-to-fine with respect to the representation
of objects and the exploration of object classes and poses.
At the beginning, the tests are universal, accommodating many
objects and poses simultaneously, but the false alarm rate is
relatively high. Eventually, the tests are more discriminating,
but also more complex and dedicated to specific objects and
poses. One result is that the spatial distribution of processing
is highly skewed and detection is very rapid, but at the expense
of (isolated) confusions. Presumably these could be eliminated
with localized, more intensive, processing, perhaps involving
global optimization.

Benjamin
B. Kimia (Brown University) kimia@lems25.lems.brown.edu
Symmetry
Maps and Transforms for Perceptual Organization and Object Recognition
Traditionally,
symmetry set representations have been defined for segmented
shape. However, the difficulties in obtaining shape from gray-level
images have led us to consider the direct acquisition of symmetry
maps from gray-level images. In this talk, we propose that the
symmetry map of an edge map is an appropriate intermediate level
representation between low-level edge maps and high-level object
models, and that transformations of it are canonical building
blocks for perceptual grouping and object recognition. First,
we review our approach for computing the symmetries (skeletons)
of an edge map (and shape) consisting of a collection of curve
segments. This approach is a combination of analytic computations
in the style of computational geometry and discrete propagations
on a grid in the style of the numerical solutions of PDE's as
in curve evolution. This framework results in (i) analytically
exact solutions, (ii) near optimal computational complexity,
(iii) local computations, and (iv) a graph representation which
can be used in applications such as object recognition. Second,
we present symmetry transformations on the symmetry map as a
language for perceptual organization. Specifically, it is proposed
that (i) a symmetry map can fully represent the initial edge
map so that both boundary and regional continuities can be represented
via skeletal/shock continuity; (ii) a re-organization of the
edge map in the form of completing gaps, discarding spurious
elements, smoothing, and partitioning a contour (grouped set
of edge elements) can be represented by transformations on the
symmetry map; (iii) perceptual grouping and object recognition
can be cast as finding the least action path in the space of
sequences of symmetry transforms.

Tai
Sing Lee (Carnegie Mellon University)
The
Influence of High Level Vision on Early Visual Processing in
the Brain
In
this talk, I will describe a series of single-unit neurophysiological
experiments on awake behaving monkey to investigate the role
of the higher level vision such as object recognition and attention
on early visual processing mechanisms in the early visual cortex.
The results of these experiments challenged many classical views
on the role of early visual cortex in visual processing, the
nature of information flow in the visual cortex. They lead to
a view that suggests the neural machinery in the early visual
cortex is highly adaptive, and highly interactive, coupled tightly
with higher order processes through massive feed forward and
feedback connections between the cortical areas.

David
Mumford (Brown University) mumford@nemo.dam.brown.edu
What
is the space of shapes and what can we do with it?
Computer
vision needs a quantitative representation of "shape" for object
recognition. Subjectively, we have a clear idea of what "shape"
means. But what is the right mathematical theory of shape? We
propose that there is a hierarchy of nested shape spaces like
the hierarchy of Sobolev/Ck spaces of functions which
define subsets S in R2 or R3 (think of
closed subsets bounded by piecewise smooth curves) with varying
degrees of complexity. The advantage of having such spaces is
that they give a setting for numerous questions:
a)
What are the natural metrics and norms on these spaces and are
they Banach manifolds, b) Define a natural cell decomposition
of this non-linear space into local linear charts, c) Define
a tangent space and Riemannian metric and find its geodesics
and its curvature, d) Define a set of probability measures on
these spaces, find their supports and relations.
Of
course, all these questions have been partly addressed already.
Thus the medial axis is a natural construction for constructing
local linear charts, Riemannian metrics have been studied in
the related question of the space of diffeomorphisms and probability
measures have been introduced using stochastic differential
equations or polygonal approximation. I will try to pull these
ideas together and point out where work needs to be done. The
case of R2 doesn't seem too hard but R3
is much more difficult.

John
Oliensis (NEC Research Institute Inc.) oliensis@research.nj.nec.com
From
Movies to Geometric 3D Models: the Structure-from-Motion Problem
I
review my recent research on structure from motion (SFM). The
problem is as follows. Given a sequence of photographic images
of a fixed 3D scene, taken by a camera at several unknown positions
and orientations, the goal is to recover: 1) a 3D geometric
model of the scene, 2) the camera's position and orientation
for each of the images. The apparent locations of the 3D points
in each image provide the information used to achieve these
goals.
My
recent results include:
1) A fast, accurate technique for deriving a scene model from
two images. As a byproduct, the technique yields upper and lower
bounds on the error surface for structure from motion.
2) The explanation of an important two-fold ambiguity in interpreting
image sequences. The artist Patrick Hughes has created several
visual illustrations of this ambiguity, and I will demonstrate
it in my talk.
3) An approximate analytic model of the SFM error surface, which
makes explicit the effects of the two-fold ambiguity associated
with planar 3D scenes. This leads to an improved understanding
of the local minima of the error surface.
4)
Multi-image algorithms that reconstruct directly from the intensity
data. Previous ``direct methods'' iteratively minimize a complex
error function and depend on an initial guess for the unknowns,
while other approaches require tracking data as input (i.e.,
they assume that distinctive image features have been pre-identified
and tracked across the sequence by some other approach).
5) A fast, accurate technique for determining the calibration
parameters of the camera (e.g., its focal length) from an image
sequence.
6) A convergence proof for the Sturm-Triggs algorithm.

Xavier
Pennec (INRIA Sophia - Project Epidaure)
xpennec@sophia.inria.fr http://www.inria.fr/epidaure/personnel/pennec/pennec.html
Probabilities
and Statistics on Riemannian Manifolds: Basic Tools for Geometric
measurements
Measurements
of geometric primitives, such as rotations or rigid transformations,
are often noisy and we need to use statistics either to reduce
the uncertainty or to compare measurements. Unfortunately, geometric
primitives often belong to manifolds and not vector spaces.
We have already shown that generalizing too quickly even simple
statistical notions could lead to paradoxes. Here, we develop
some basic probabilistic tools to work on Riemannian manifolds:
the notion of mean value, covariance matrix, normal law, Mahalanobis
distance and 2 test. We also present an efficient
algorithm to compute the mean value and tractable approximations
of the normal and 2 laws for small variances. Finally,
we present some applications in medical image analysis, mainly
on the computation of the uncertainty of the registration of
3D images.

Pietro
Perona
(Caltech)
perona@its.caltech.edu
Unsupervised
Learning of Models for Object Recognition
Recognizing objects in images is one of the most important functions
of our visual system. Not only can we recognize individual objects,
such as the Eiffel Tower or our grandmothers face, but also
categories of objects, such as shoes, automobiles and frogs.
Considerable attention has been devoted to formulating models
and algorithms that may explain visual recognition; however,
no theory is yet available for how these models may be trained
automatically in realistic conditions: Can a child, or a machine,
learn to recognize `faces and `cars only by looking? This is
at best a difficult task: everyday images are cluttered and
may not contain explicit information on the presence, location
and structure of new objects. I will present a computational
theory of how object models may be learned from such data. Object
categories are modelled as collections of parts that appear
in a characteristic spatial arrangement. Both part appearance
and constellation shape are modelled probabilisitcally. Model
training is achieved by maximum likelyhood.

Jayant M. Shah (Northeastern
University) shah@neu.edu
Local
Symmetries and Segmentaton of Shapes
The set of local symmetry axes is found by analyzing the level
curves of a function which is the solution of an elliptic PDE.
These level curves may be thought of as successive smoothings
of the shape boundary. A point on a level curve is a point of
local symmetry if the level curve is symmetric about the gradient
vector at that point upto second order. The local symmetry axes
may also be described as the ridges and the valleys of the graph
of this function. The rationale underlying this approach is
that if a shape has certain symmetries, the solution of the
PDE ought to reflect these symmetries.
The set of local symmetry axes includes loci which are analogous
to the more commonly used medial axes. If a 2D shape is viewed
as a collection of ribbons glued together, then the local symmetry
axis of each ribbon along its length may be viewed as its medial
axis. Alternatively, if the shape is viewed as a distorted circle,
distorted by protrusions and indentations, then the local symmetry
axis of each protrusion along its length is its medial axis.
There are two main advantages of this approach. It is possible
to calculate the necessary properties of the level curves from
the differential properties of the function without having to
locate the level curves themselves, and the use of an elliptic
PDE makes it unnecessary to presmooth the shape boundary. Moreover,
it is straightforward to extend the definition of local symmetry
to higher dimensions.
Unlike the shape skeletons found by the Blum transform or by
decomposing the shape into a set of ribbons, the set of local
symmetries is usually not connected. One can obtain a connected
set by extending the local symmetry axes to join up with the
nearby axes. However, it is more natural to use the local symmetry
axes to segment the shape boundary, thus preserving all shape
information. The segmentation obtained in this way has the struture
of a graph.

Shimon
Ullman
(The Weizmann Institute of Science)
Object
Recognition and Classification
The tasks of visual object recognition and classification are
natural and effortless for biological visual systems, but exceedingly
difficult to replicate in computer vision systems. The major
difficulty comes from the fact that the same object can have
many different retinal projections, depending on such factors
as the viewing direction, illumination conditions, partial occlusion
by other objects, and shape variability.
In this talk I will describe the basic problems in two related
problems in recognition -- specific object identification, and
general object classification. I will describe an approach to
object classification where objects are represented in terms
of common image fragments, that are used as building blocks
for representing a large variety of objects within a class.
The talk will describe how optimal fragments are extracted,
and how they are used in the classification task.

Laurent
Younes
(CNRS) younes@cmla.ens-cachan.fr
Metrics,
Shapes and Deformations
Using a Riemannian point of view to design comparison methods
and evaluate variations within high dimensional spaces such
as shapes is a conceptually simple and quite generic approach.
Adding robustness or invariance with respect to a group action
in this framework leads, by projecting into orbits, to interesting
theoretical issues and nice results, especially when groups
of diffeomorphims come into the picture. After a quick review
on how this approach may relate to known examples in the literature
(Kendall's shape space, Grenander's deformable templates,...)
we show how to design geodesic distances, and estimate diffeomorphisms
between configurations of points in space, or between grey-colored
images.

Alan
L. Yuille (Smith-Kettlewell Eye Research Institute)
yuille@ski.org
in
collaboration with J. Coughlan, S.C. Zhu (Ohio State), Y.N.
Wu (UCLA).
Order
Parameters for Detecting Target Curves in Images: When Does
High Level Knowledge Help?
Many
problems in vision can be formulated as Bayesian inference.
It is important to determine the accuracy of these inferences
and how they depend on the problem domain. In this paper, we
provide a theoretical framework based on Bayesian decision theory
which involves evaluating performance based on an ensemble of
problem instances. We pay special attention to the task of detecting
a target in the presence of background clutter. This framework
is then used to analyze the detectability of curves in images.
We restrict ourselves to the case where the probability models
are ergodic (both for the geometry of the curve and for the
imaging). These restrictions enable us to use techniques from
large deviation theory to simplify the analysis. We show that
the detectability of curves depend on a parameter K which is
a function of the probability distributions characterizing the
problem. At critical values of K the target becomes impossible
to detect on average. Our framework also enables us to determine
whether a simpler approximate model is sufficient to detect
the target curve and hence clarify how much information is required
to perform specific tasks. These results generalize our previous
work by placing it in a Bayesian decision theory framework,
by extending the class of probability models which can be analyzed,
and by analysing the case where approximate models are used
for inference.

Material
from Talks
Image Analysis and High Level Vision Modeling
2000-2001
Program: Mathematics in Multimedia
Top
of page
|