Various mathematical frameworks have been explored for visual learning and visual recognition. Special cases of Bayesian modeling and inference, such as deformable templates and compositional vision, lead to the construction of probability measures on complex structures, such as grammars, graphs and spaces of transformations. Difficult mathematical questions arise in learning these and other representations from training data consisting of labeled examples. Recently, tools from statistical inference and information theory have led to new bounds on generalization error as well as new learning algorithms for classification. Most methods for interpreting complex scenes raise formidable computational challenges, inspiring new strategies for visual search.
Despite stronger theoretical foundations, and despite progress in special cases of learning and recognition (e.g., in finding instances from one object class and in analyzing the limits of inductive learning), the full scene interpretation problem remains out of reach. The purpose of this workshop is to present and contrast differing proposals about central issues such as computation, tradeoffs in learning vs. modeling, connections with natural vision and the roles of classification, bottom-up processing (e.g., segmentation) and hierarchies of parts.