HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program

Material from Talks

Yali Amit (The University of Chicago)  amit@galton.uchicago.edu

A Neural Architecture for Learning, Detecting and Recognizing Objects

I will present a neural architecture based on simple binary neurons which uses field dependent Hebbian learning to train object models and classifiers. The models are used to drive detection, and the classifiers for recognition; all are integrated into one architecture.

The object models as well as the classifiers are based on a family of binary local features with hard wired invariance to contrast changes and local geometric deformations. Recognition among several object classes is obtained through a vote among a large number of randomized perceptrons based on these binary features. When a particular object model is evoked in a central module, detection in the entire visual scene, and at a range of poses, is obtained through top-down priming of particular retinotopic detectors of the features. Some analogies to well known experiments on object detection and recognition in primates will be discussed.

This is joint work with Massimo Mascaro.

Peter N. Belhumeur (Center for Computational Vision and Control Yale University)   belhumeur@yale.edu

Shedding Light on Illumination

In this talk we consider a number of interrelated issues in the appearance and modeling of objects over changes in illumination: 1. We show that show that for an object with Lambertian reflectance there are no discriminative functions that are invariant to illumination. 2. We develop a generative method for modeling objects under variable illumination and pose from a small number of exemplars. 3. We point out implicit ambiguities and historical missteps in determining shape from shadows and shading. 4. We develop an exhaustive phenomenological method for shape reconstruction and image-based rendering that differs from item 2 above in that it makes no assumptions about an objects shape or BRDF. 5. We present a low-dimensional, data-driven representation for BRDFs in an attempt to bridge the gap in the above-mentioned methods. Throughout the talk we present supporting results and demonstrations using large databases of objects and faces under variable illumination

Rama Chellappa (Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland)  rama@cfar.umd.edu

Face Recognition and Verification in Still and Video Images

Face recognition and verification from still and video images has important applications in surveillance, HCI design and access control. In this talk we will present our recent work on a number of problems in this area. Specifically we will discuss the following: algorithms for for face detection in still images using optimal shape operators, use of symmetry and shape from shading for designing algorithms that are robust to changes in illumination and pose, face detection from video, simultaneous tracking and verification of humans using sequential importance sampling techniques, shape-encoded tracking of human heads using branching-particle methods, gait-based recognition of humans using continuous HMM models and the Maryland real-time tracking system for tracking humans.

Ernst D. Dickmanns  (Universität der Bundeswehr München)   Ernst.Dickmanns@UniBw-Muenchen.de

Expectation-based, Multi-focal, Saccadic (EMS-) Vision.

(A System for Understanding Dynamic Scenes Observed from a Moving Platform)

Expectation-based, Multi-focal, Saccadic (EMS-) Vision has been designed to cope with many different aspects of mission performance for a variety of vehicles. A wide field of view (f.o.v., > ~100°) nearby allows to avoid moving obstacles at slow speed and to negotiate tight curves. Trinocular stereo in a small central f.o.v. yields good depth estimations in the near range with one single well recognizable feature. Active gaze control allows to shift this f.o.v. to where it is needed, and to inertially stabilize the viewing direction for eliminating motion blur. Ego-motion under strong perturbat ions is determined by inertial/visual data fusion taking advantage of spatio-temporal models on differ ential and integral scales.

Scene representation is done in a dynamic scene tree exploiting homogeneous coordinate transformations like in computer graphics; however, in computer vision many of the entries into the transformation matrices and the generic object models are the unknowns of the problem. In the 4 -D approach developed on the basis of the extended Kalman filtering, these unknowns are determined by an initial (daring) guess and consecutive recursive improvements by prediction error feedback exploiting rich first order approximations (in 3-D space including perspective mapping) through Jacobian matrices for each object/sensor-pair. The Dynamic Object dataBase (DOB) [containing the scene tree representation among other knowledge components about the vehicle status] is the central layer for separating the 'Systems-Engineering' lower part of the overall cognitive system from the more 'Artificial Intelligence'-oriented upper part with state chart representations. On the higher levels, the situation comprising several objects and the own intentions (goals) is assessed and behavioral decisions are taken in the mission context. Here, knowledge about the effects of maneuvers and of the application of feedback control laws i s available. Actual maneuver performance and control computations are done on the lower levels with dedicated processors in the distributed overall system (about a dozen processors). Experimental results in fully autonomous road vehicle guidance with the test veh icles 'VaMoRs' (a 5-ton van, maneuvering on minor road networks) and `VaMP' (a Mercedes 500 SEL, displaying hybrid adaptive cruise control on highways) will be shown.

Ian L. Dryden (School of Mathematical Sciences, University of Nottingham) ild@maths.nott.ac.uk   http://www.maths.nott.ac.uk/personal/ild

Statistical Shape Analysis in High-Level Vision

Shape is an essential ingredient of high-level image analysis. The geometrical description of an object can be separated into two parts: the registration information and the `shape' (which is invariant under registration transformations). A common choice of registration is the group of Euclidean similarity transformations and the geometrical prosperities that are invariant under this group of transformations are known as `similarity shape'. In a Bayesian approach to object recognition shape information is usually specified as part of the prior distribution. The prior is then combined with the likelihood, or image model, leading to posterior inference about the object. The statistical theory of shape began with the independent work of David Kendall, Fred Bookstein and Herbert Ziezold in the 1970s. Subsequent developments have led to a deep differential geometric theory of shape spaces, as well as practical statistical approaches to analysing objects using probability distributions of shape and likelihood based inference. A summary of the field is given by Dryden and Mardia (1998, Wiley), where the main emphasis is on the shapes of labeled point set configurations. In the image analysis literature there are numerous works on the notion of shape, many of which are directly related to the work in Kendall's shape spaces. A common feature of the approaches is some form of shape metric, and many of the shape representations and metrics in common use are related through approximate affine transformations of the particular shape coordinates being used. In the talk I shall discuss some of the main aspects of statistical shape analysis, making comparisons with alternative approaches, which are often based on collections of angles or ratios of distances. Some applications of shape analysis in image analysis will be described. Finally, one of the major advantages of using statistical shape analysis is that statistical inference can be carried out when the images consists a sample of objects, and we consider an example where it is of interest to test whether or not two populations have different mean shapes.

Davi Geiger (Courant Institute, NYU)   geiger@cs.nyu.edu

Measuring the Convexity of Shapes

Many recognition tasks requires shape understanding. One of the most important measures of shapes is convexity. Psychophysics experiments show that the human visual system prefers convexity over symmetry to select figure from background. While we have today a good understanding of symmetries of shapes (e.g., skeletons), we have not yet devoted (much) attention to convexity. In particular, today shapes can be classified either as convex or not convex(concave), but the psychophysics experiments refer to shapes that are not perfectly convex.

We propose a continuous measure of convexity for shapes. We investigate a Markov random field model for extracting convexity (we can't leave home without them, or , there are things that science can't explain but for everything else there are MRFs.) In our approach convexity becomes an emergent property from a sum of local interactions (were each local term does not contain convexity). We analyse our approach and extensively experiment with it.

This is work in collaboration with Nava Rubin and Hsing-Kuo Pao.

Donald Geman (University of Massachusetts)  geman@cmla.ens-cachan.fr

Coarse-to-Fine Object Detection

Object recognition is one of the primary goals of high-level computer vision, especially for real greyscale scenes and with the speed and precision of human vision. I will talk about a simpler but still vexing problem: detect and roughly localize all highly visible instances a small set of generic object classes, such as faces and cars - or even from only one class, measuring performance in terms of computation and false alarms. The approach, motivated by efficient computation, is sequential testing which is highly coarse-to-fine with respect to the representation of objects and the exploration of object classes and poses. At the beginning, the tests are universal, accommodating many objects and poses simultaneously, but the false alarm rate is relatively high. Eventually, the tests are more discriminating, but also more complex and dedicated to specific objects and poses. One result is that the spatial distribution of processing is highly skewed and detection is very rapid, but at the expense of (isolated) confusions. Presumably these could be eliminated with localized, more intensive, processing, perhaps involving global optimization.

Benjamin B. Kimia (Brown University)  kimia@lems25.lems.brown.edu

Symmetry Maps and Transforms for Perceptual Organization and Object Recognition

Traditionally, symmetry set representations have been defined for segmented shape. However, the difficulties in obtaining shape from gray-level images have led us to consider the direct acquisition of symmetry maps from gray-level images. In this talk, we propose that the symmetry map of an edge map is an appropriate intermediate level representation between low-level edge maps and high-level object models, and that transformations of it are canonical building blocks for perceptual grouping and object recognition. First, we review our approach for computing the symmetries (skeletons) of an edge map (and shape) consisting of a collection of curve segments. This approach is a combination of analytic computations in the style of computational geometry and discrete propagations on a grid in the style of the numerical solutions of PDE's as in curve evolution. This framework results in (i) analytically exact solutions, (ii) near optimal computational complexity, (iii) local computations, and (iv) a graph representation which can be used in applications such as object recognition. Second, we present symmetry transformations on the symmetry map as a language for perceptual organization. Specifically, it is proposed that (i) a symmetry map can fully represent the initial edge map so that both boundary and regional continuities can be represented via skeletal/shock continuity; (ii) a re-organization of the edge map in the form of completing gaps, discarding spurious elements, smoothing, and partitioning a contour (grouped set of edge elements) can be represented by transformations on the symmetry map; (iii) perceptual grouping and object recognition can be cast as finding the least action path in the space of sequences of symmetry transforms.

Tai Sing Lee (Carnegie Mellon University)

The Influence of High Level Vision on Early Visual Processing in the Brain

In this talk, I will describe a series of single-unit neurophysiological experiments on awake behaving monkey to investigate the role of the higher level vision such as object recognition and attention on early visual processing mechanisms in the early visual cortex. The results of these experiments challenged many classical views on the role of early visual cortex in visual processing, the nature of information flow in the visual cortex. They lead to a view that suggests the neural machinery in the early visual cortex is highly adaptive, and highly interactive, coupled tightly with higher order processes through massive feed forward and feedback connections between the cortical areas.

David Mumford (Brown University)   mumford@nemo.dam.brown.edu

What is the space of shapes and what can we do with it?

Computer vision needs a quantitative representation of "shape" for object recognition. Subjectively, we have a clear idea of what "shape" means. But what is the right mathematical theory of shape? We propose that there is a hierarchy of nested shape spaces like the hierarchy of Sobolev/Ck spaces of functions which define subsets S in R2 or R3 (think of closed subsets bounded by piecewise smooth curves) with varying degrees of complexity. The advantage of having such spaces is that they give a setting for numerous questions:

a) What are the natural metrics and norms on these spaces and are they Banach manifolds, b) Define a natural cell decomposition of this non-linear space into local linear charts, c) Define a tangent space and Riemannian metric and find its geodesics and its curvature, d) Define a set of probability measures on these spaces, find their supports and relations.

Of course, all these questions have been partly addressed already. Thus the medial axis is a natural construction for constructing local linear charts, Riemannian metrics have been studied in the related question of the space of diffeomorphisms and probability measures have been introduced using stochastic differential equations or polygonal approximation. I will try to pull these ideas together and point out where work needs to be done. The case of R2 doesn't seem too hard but R3 is much more difficult.

John Oliensis (NEC Research Institute Inc.)  oliensis@research.nj.nec.com

From Movies to Geometric 3D Models: the Structure-from-Motion Problem

I review my recent research on structure from motion (SFM). The problem is as follows. Given a sequence of photographic images of a fixed 3D scene, taken by a camera at several unknown positions and orientations, the goal is to recover: 1) a 3D geometric model of the scene, 2) the camera's position and orientation for each of the images. The apparent locations of the 3D points in each image provide the information used to achieve these goals.

My recent results include:

1) A fast, accurate technique for deriving a scene model from two images. As a byproduct, the technique yields upper and lower bounds on the error surface for structure from motion.

2) The explanation of an important two-fold ambiguity in interpreting image sequences. The artist Patrick Hughes has created several visual illustrations of this ambiguity, and I will demonstrate it in my talk.

3) An approximate analytic model of the SFM error surface, which makes explicit the effects of the two-fold ambiguity associated with planar 3D scenes. This leads to an improved understanding of the local minima of the error surface.

4) Multi-image algorithms that reconstruct directly from the intensity data. Previous ``direct methods'' iteratively minimize a complex error function and depend on an initial guess for the unknowns, while other approaches require tracking data as input (i.e., they assume that distinctive image features have been pre-identified and tracked across the sequence by some other approach).

5) A fast, accurate technique for determining the calibration parameters of the camera (e.g., its focal length) from an image sequence.

6) A convergence proof for the Sturm-Triggs algorithm.

Xavier Pennec (INRIA Sophia - Project Epidaure)  xpennec@sophia.inria.fr  http://www.inria.fr/epidaure/personnel/pennec/pennec.html

Probabilities and Statistics on Riemannian Manifolds: Basic Tools for Geometric measurements

Measurements of geometric primitives, such as rotations or rigid transformations, are often noisy and we need to use statistics either to reduce the uncertainty or to compare measurements. Unfortunately, geometric primitives often belong to manifolds and not vector spaces. We have already shown that generalizing too quickly even simple statistical notions could lead to paradoxes. Here, we develop some basic probabilistic tools to work on Riemannian manifolds: the notion of mean value, covariance matrix, normal law, Mahalanobis distance and 2  test. We also present an efficient algorithm to compute the mean value and tractable approximations of the normal and 2  laws for small variances. Finally, we present some applications in medical image analysis, mainly on the computation of the uncertainty of the registration of 3D images.

Pietro Perona (Caltech)   perona@its.caltech.edu

Unsupervised Learning of Models for Object Recognition

Recognizing objects in images is one of the most important functions of our visual system. Not only can we recognize individual objects, such as the Eiffel Tower or our grandmothers face, but also categories of objects, such as shoes, automobiles and frogs. Considerable attention has been devoted to formulating models and algorithms that may explain visual recognition; however, no theory is yet available for how these models may be trained automatically in realistic conditions: Can a child, or a machine, learn to recognize `faces and `cars only by looking? This is at best a difficult task: everyday images are cluttered and may not contain explicit information on the presence, location and structure of new objects. I will present a computational theory of how object models may be learned from such data. Object categories are modelled as collections of parts that appear in a characteristic spatial arrangement. Both part appearance and constellation shape are modelled probabilisitcally. Model training is achieved by maximum likelyhood.

Jayant M. Shah (Northeastern University)  shah@neu.edu

Local Symmetries and Segmentaton of Shapes

The set of local symmetry axes is found by analyzing the level curves of a function which is the solution of an elliptic PDE. These level curves may be thought of as successive smoothings of the shape boundary. A point on a level curve is a point of local symmetry if the level curve is symmetric about the gradient vector at that point upto second order. The local symmetry axes may also be described as the ridges and the valleys of the graph of this function. The rationale underlying this approach is that if a shape has certain symmetries, the solution of the PDE ought to reflect these symmetries.

The set of local symmetry axes includes loci which are analogous to the more commonly used medial axes. If a 2D shape is viewed as a collection of ribbons glued together, then the local symmetry axis of each ribbon along its length may be viewed as its medial axis. Alternatively, if the shape is viewed as a distorted circle, distorted by protrusions and indentations, then the local symmetry axis of each protrusion along its length is its medial axis.

There are two main advantages of this approach. It is possible to calculate the necessary properties of the level curves from the differential properties of the function without having to locate the level curves themselves, and the use of an elliptic PDE makes it unnecessary to presmooth the shape boundary. Moreover, it is straightforward to extend the definition of local symmetry to higher dimensions.

Unlike the shape skeletons found by the Blum transform or by decomposing the shape into a set of ribbons, the set of local symmetries is usually not connected. One can obtain a connected set by extending the local symmetry axes to join up with the nearby axes. However, it is more natural to use the local symmetry axes to segment the shape boundary, thus preserving all shape information. The segmentation obtained in this way has the struture of a graph.

Shimon Ullman (The Weizmann Institute of Science)

Object Recognition and Classification

The tasks of visual object recognition and classification are natural and effortless for biological visual systems, but exceedingly difficult to replicate in computer vision systems. The major difficulty comes from the fact that the same object can have many different retinal projections, depending on such factors as the viewing direction, illumination conditions, partial occlusion by other objects, and shape variability.

In this talk I will describe the basic problems in two related problems in recognition -- specific object identification, and general object classification. I will describe an approach to object classification where objects are represented in terms of common image fragments, that are used as building blocks for representing a large variety of objects within a class. The talk will describe how optimal fragments are extracted, and how they are used in the classification task.

Laurent Younes (CNRS)   younes@cmla.ens-cachan.fr

Metrics, Shapes and Deformations

Using a Riemannian point of view to design comparison methods and evaluate variations within high dimensional spaces such as shapes is a conceptually simple and quite generic approach. Adding robustness or invariance with respect to a group action in this framework leads, by projecting into orbits, to interesting theoretical issues and nice results, especially when groups of diffeomorphims come into the picture. After a quick review on how this approach may relate to known examples in the literature (Kendall's shape space, Grenander's deformable templates,...) we show how to design geodesic distances, and estimate diffeomorphisms between configurations of points in space, or between grey-colored images.

Alan L. Yuille (Smith-Kettlewell Eye Research Institute)    yuille@ski.org

in collaboration with J. Coughlan, S.C. Zhu (Ohio State), Y.N. Wu (UCLA).

Order Parameters for Detecting Target Curves in Images: When Does High Level Knowledge Help?

Many problems in vision can be formulated as Bayesian inference. It is important to determine the accuracy of these inferences and how they depend on the problem domain. In this paper, we provide a theoretical framework based on Bayesian decision theory which involves evaluating performance based on an ensemble of problem instances. We pay special attention to the task of detecting a target in the presence of background clutter. This framework is then used to analyze the detectability of curves in images. We restrict ourselves to the case where the probability models are ergodic (both for the geometry of the curve and for the imaging). These restrictions enable us to use techniques from large deviation theory to simplify the analysis. We show that the detectability of curves depend on a parameter K which is a function of the probability distributions characterizing the problem. At critical values of K the target becomes impossible to detect on average. Our framework also enables us to determine whether a simpler approximate model is sufficient to detect the target curve and hence clarify how much information is required to perform specific tasks. These results generalize our previous work by placing it in a Bayesian decision theory framework, by extending the class of probability models which can be analyzed, and by analysing the case where approximate models are used for inference.

Material from Talks

Image Analysis and High Level Vision Modeling

2000-2001 Program: Mathematics in Multimedia

Top of page
Connect With Us: