# Probabilistic and Statistical Methods

Monday, September 11, 2000 - 2:00pm - 3:30pm

Keller 3-180

Basilis Gidas (Brown University)

In this tutorial we will present a solid introduction to some main stochastic model-based paradigms for image analysis and interpretation; the relevance of the paradigms to speech recognition, expert systems, coding theory, and linguistics, will also be pointed out. The paradigms are based on rigorous mathematical principles from Bayesian statistics, Information theory, Signal processing, and other disciplines; they support Monte Carlo and Dynamic programming type computational algorithms, as well as powerful parameter (parametric and nonparametric) estimation techniques. The tutorial will emphasize both methodology and applications. It will focus on three main topics:

(1) Stochastic Graphical Models and Applications. Here we will describe Markov Random Fields (MRF) and their Gibbs representations, with dependency graphs that include linear graphs (relevant to speech, filtering, convolution codes, and other applications), regular lattices (relevant to low-level vision tasks), and tree-structured graphs (relevant to high-level vision tasks, linguistics, error-correcting codes, etc). Dynamic Monte Carlo and Dynamic programming algorithms for sampling, optimization, or mean estimation will be presented, together with a summary of EM and variants of ML parameter estimation procedures. The main application to be treated will be texture segmentation and identification, but speech recognition, image enhancement, and tomographic reconstruction will also be indicated.

(2) Object Recognition. After a brief discussion of the main issues (e.g. invariance, contextual and global constraints) and some methodologies (especially templates, compositional/syntactic), we will focus on the decision trees approach to object recognition. We will begin with the classical Huffman code and the constrained 20 questions problem. These are special cases of the statistical decision trees approach to object recognition. The basic building blocks of these trees are the queries (or tests or experiments), i.e. a family of image data features. The choice of queries is critical. Most real-world recognition problems require a nearly infinite family of queries, and standard decision trees construction based on a fixed-length feature vector is not feasible. We will present a procedure (developed by Y. Amit and D. Geman) that simultaneously selects features and builds trees by inductive learning; the recognition algorithm employs multiple decision trees. We will describe the procedure using primarily the handwritten, binary, digit recognition problem as an example.

(3) Simultaneous Tracking and Recognition. Here we describe a coherent framework for tracking/recognition on the basis of video image sequences, that contains three basic models: (a) An Object Model that articulates the overall shape architecture of an object, together with the shape's random variabilities (position, orientation, non-rigid elastic deformations); (b) a dynamic model that describes an object's dynamical motions; and (c) a data (or observation) model that relates the image gray-level data (or functions thereof) to the object and dynamic models, and articulates random variability of the image data due to factors of uncertainty such as clutter, occlusion, noise, blur, etc. The combination of these models leads to a (typically) nonlinear filtering problem which is equivalent to a HMM. The solution of this filtering problem requires a computational (filtering) algorithm. The framework will be demonstrated using deformable templates for object representation, dynamical equations derived from Lagrangian mechanics, and a data model based on nonparametric ( rank type) statistics. We will describe a Monte Carlo type filtering algorithm (first employed by Blake and Isard), and compare it with the classical extented Kalman filter. Variations of the procedure based on compositional/syntactic models for object representation will also be described. The performance of the procedure will be demonstrated with a video showing the tracking of objects moving in highly cluttered environments.

(1) Stochastic Graphical Models and Applications. Here we will describe Markov Random Fields (MRF) and their Gibbs representations, with dependency graphs that include linear graphs (relevant to speech, filtering, convolution codes, and other applications), regular lattices (relevant to low-level vision tasks), and tree-structured graphs (relevant to high-level vision tasks, linguistics, error-correcting codes, etc). Dynamic Monte Carlo and Dynamic programming algorithms for sampling, optimization, or mean estimation will be presented, together with a summary of EM and variants of ML parameter estimation procedures. The main application to be treated will be texture segmentation and identification, but speech recognition, image enhancement, and tomographic reconstruction will also be indicated.

(2) Object Recognition. After a brief discussion of the main issues (e.g. invariance, contextual and global constraints) and some methodologies (especially templates, compositional/syntactic), we will focus on the decision trees approach to object recognition. We will begin with the classical Huffman code and the constrained 20 questions problem. These are special cases of the statistical decision trees approach to object recognition. The basic building blocks of these trees are the queries (or tests or experiments), i.e. a family of image data features. The choice of queries is critical. Most real-world recognition problems require a nearly infinite family of queries, and standard decision trees construction based on a fixed-length feature vector is not feasible. We will present a procedure (developed by Y. Amit and D. Geman) that simultaneously selects features and builds trees by inductive learning; the recognition algorithm employs multiple decision trees. We will describe the procedure using primarily the handwritten, binary, digit recognition problem as an example.

(3) Simultaneous Tracking and Recognition. Here we describe a coherent framework for tracking/recognition on the basis of video image sequences, that contains three basic models: (a) An Object Model that articulates the overall shape architecture of an object, together with the shape's random variabilities (position, orientation, non-rigid elastic deformations); (b) a dynamic model that describes an object's dynamical motions; and (c) a data (or observation) model that relates the image gray-level data (or functions thereof) to the object and dynamic models, and articulates random variability of the image data due to factors of uncertainty such as clutter, occlusion, noise, blur, etc. The combination of these models leads to a (typically) nonlinear filtering problem which is equivalent to a HMM. The solution of this filtering problem requires a computational (filtering) algorithm. The framework will be demonstrated using deformable templates for object representation, dynamical equations derived from Lagrangian mechanics, and a data model based on nonparametric ( rank type) statistics. We will describe a Monte Carlo type filtering algorithm (first employed by Blake and Isard), and compare it with the classical extented Kalman filter. Variations of the procedure based on compositional/syntactic models for object representation will also be described. The performance of the procedure will be demonstrated with a video showing the tracking of objects moving in highly cluttered environments.