University of Minnesota
University of Minnesota

Abstracts and Talk Materials:

Natural Images

March 6-10, 2006

Event ID: #1847, Published: September 11, 2006 00:00:05

Second Chances

Wednesday, March 8

Edward H. Adelson (Massachusetts Institute of Technology)

Image statistics and surface perception

It is mathematically impossible to tell whether a surface is white, gray, or black, by looking at it in isolation, since the luminance is the product of two unknown variables, illumination and reflectance (albedo). Nonetheless people can do it pretty well, proving that the human visual system is smarter than the people who study it. Real surfaces, such as paper, cloth, or stucco, have visual textures that depend on interreflections and specular reflections, and some of the resultant image statistics are correlated with surface properties such as albedo and gloss. By manipulating these statistics, we can make the surface look lighter or darker (and duller or shinier) without changing the mean luminance. In a related project, we are exploring how local statistics can be used to separate shading and albedo in natural images. Working in the derivative domain (as in Retinex), we train on images with ground truth "intrinsic images" of shading and albedo, and learn to estimate the derivatives based on local image patches. We then do a pseudoinverse to retrieve the images. The results are good: we can separate an image into its shading and albedo components better than previous methods, including our own previous methods that relied on classification rather than estimation.

Richard Baraniuk (Rice University)

Natural images, multiscale manifold models, and compressive imaging

The images generated by varying the underlying articulation parameters of an object (pose, attitude, light source position, and so on) can be viewed as points on a low-dimensional "image appearance manifold" (IAM) in a high-dimensional ambient space. In this talk, we will expand on the observation that typical IAMs are not differentiable, in particular if the images contain sharp edges. However, all is not lost, since IAMs have an intrinsic multiscale geometric structure. In fact, each IAM has a family of approximate tangent spaces, each one good at a certain resolution. In the first part of the talk, we will focus on the particular inverse problem of estimating, from a given image on or near an IAM, the underlying parameters that produced it. Putting the multiscale structural aspect to work, we develop a new algorithm for high-accuracy parameter estimation based on a coarse-to-fine Newton iteration through the family of approximate tangent spaces. This algorithm is reminiscent of recently proposed algorithms for multiscale image registration and super-resolution. In the second part of the talk, we will explore IAMs in the context of "Compressive Imaging" (CI), where we attempt to recover an image from a small number of (potentially random) projections. To date, CI has focused on sparsity-based image models; we will discuss how IAM models could offer better performance for geometry-rich images.

This is joint work with Michael Wakin, Hyeokho Choi, and David Donoho.

Mike J. Chantler (Heriot-Watt University)

Perception and classification of surface texture

I will present a simple first order model of how variation in illumination affects the output of Filter Response Filters (FRF).

FRF are of interest because:
(a) they are commonly used as texture features in automated texture classification systems, and
(b) they are typically proposed as the "back pocket model" of the first stage of the human visual system.

I'll show how naïve classifiers built using these simple features can fail, and how the model can be used to produce a classifier that is robust to illumination variation.

What this will show is that single still images are not often not sufficient for the purposes of surface classification - either for human or automated systems.

I'll conclude by describing some of our recent research that is investigating our perceptions of surface texture.

Aaron Clarke (York University)

Efficient Coding Schemes for Natural Image Statistics

Joint work with James Elder.

The statistics of natural scenes useful for contour grouping are examined from an information theoretic point of view. We focus on two particularly important grouping cues: proximity and good continuation. Advances on previous studies of contour grouping statistics include: 1. measurements based upon more accurately localized edges, 2. an analysis of grouping statistics as a function of arc-length separation along the contour, 3. a comparison of competing methods for efficient representation of good continuation cues, 4. a comparison of contour statistics for natural and human-made scenes. Our results reveal proximity to follow a power law model, and parallelism and co-circularity to form an intuitive and efficient coding scheme for the angular relationships between edges.

James Damon (University of North Carolina)

Characterizing local features of illuminated objects

The perceived shapes of objects in images result from a collection of visual clues. These clues follow from the interplay of geometric features such as perceived boundaries, edges and corners, delineating curves on object surfaces, and features resulting from illumination such as shadow curves and specularity. Furthermore, a viewer gains such information not just from static images but also from perceived changes resulting from change in viewing direction.

In this talk, we explain how it is possible to determine a catalog of possible local models for the generic interplay between geometric features and shadow curves. This catalogue can be expanded to included the expected changes in such models under movement in viewing direction.

Such a catalog is constructed through the use of singularity theory, which is a mathematical theory that allows construction of such classifications based on stability and possible perturbations. We explain the general features of the classification and indicate how it is obtained.

This is the result of joint work carried out with Peter Giblin and Gareth Haslinger.

Marco F. Duarte (Rice University)

Compressive imaging for image and video acquisition

Compressive Sensing is an emerging field based on the revelation that a small group of non-adaptive linear projections of a compressible signal contains enough information for reconstruction and processing. We propose algorithms and hardware to support a new theory of Compressive Imaging. Our approach is based on a new digital image/video camera that directly acquires random projections of the light field without first collecting the pixels/voxels. Our camera architecture employs a digital micromirror array to perform optical calculations of linear projections of an image onto pseudorandom binary patterns. Its hallmarks include the ability to obtain an image/video snapshot with a single detection element while measuring the image/video fewer times than the number of pixels/voxels; this can significantly reduce the computation required for image/video acquisition and encoding. Since our system relies on a single photon detector, it can also be adapted to image at wavelengths that are currently impossible with conventional CCD and CMOS imagers. We are currently testing a prototype design for the camera and present experimental results.

This is joint work with Michael Wakin, Jason Laska, Dror Baron, Shriram Sarvotham, Dharmpal Takhar, Kevin Kelly and Richard Baraniuk.

James Elder (York University)

Natural image contours

The important role of contours in visual perception has been recognized for many years (e.g., Wertheimer 1923/1938). While early Gestalt insights derive from observation of highly idealized images, decades of computer vision research have demonstrated the computational complexity of inferring and exploiting contours in natural images. Physiological data, while generating some intriguing clues, are often too local (single unit recording) or too global (imaging) to provide the data needed to constrain existing models or inspire new ones.

In this talk I will discuss recent work that attempts to bring together psychophysical, computational and physiological approaches to understanding contour processing in natural images. A unifying foundation for this effort is a continuing project to measure and model the statistics of natural image contours. These ecological results lead to new computer vision algorithms for natural contour grouping, normative models for contour processing that may be evaluated psychophysically, and to new models for neural selectivity to natural image contours that may be tested against physiological data.

William T. Freeman (Massachusetts Institute of Technology)

Removing photographic blur caused by camera motion: How can you identify when an image looks blurred?

Camera shake during exposure leads to objectionable image blur and ruins many photographs. Conventional blind deconvolution methods typically assume frequency domain constraints on images, or overly simplied parametric forms for the motion path during camera shake. Real camera motions can follow convoluted paths, and a spatial domain prior can better maintain visually salient image characteristics. We introduce a multi-scale method to remove the effects of camera shake from seriously blurred images, by estimating the most probable blur and original image using a variational approximation to the posterior probability, and assuming a heavy-tailed distribution for bandpassed image statistics. Our method assumes a uniform camera blur over the image, negligible in-plane camera rotation, and no blur caused by moving objects in the scene. The algorithm operator specifies an image region without saturation effects within which to estimate the blur kernel. I'll discuss issues in this blind deconvolution problem, and show results for a variety of digital photographs.

Invitation to submit examples: I invite audience members to submit examples of motion-blurred photographs to me a few days ahead of time. I'll show the images you submit, and the result of our algorithm applied to them. If you have a favorite blind deconvolution or restoration algorithm, please apply it to your image and send it and I'll show that, too.

Joint work with: Rob Fergus, Barun Singh, both from MIT CSAIL, and Aaron Hertzman and Sam Roweis, both from the University of Toronto.

Haleh Hagh-Shenas (University of Minnesota Twin Cities) , Victoria Interrante (University of Minnesota Twin Cities)

A closer look at texture metrics for visualization

Joint work with Haleh Hagh-Shenas.

An ongoing goal of research in multivariate visualization is to determine how to most effectively use visual features, such as color and texture, to efficiently and accurately convey information about multiple scalar-valued data distributions defined over a common domain. While there currently exists an extensive knowledge base in issues related to color perception and the effective use of color for uni-variate and bi-variate data visualization, research into the effective use of texture for data visualization is considerably less mature. In this poster we present the findings from three pilot experiments with natural texture images intended to provide insight into issues in texture perception that have the potential to inform our efforts to more effectively harness the full potential of texture as a visual variable capable of simultaneously conveying information about multiple data distributions.

Daniel Kersten (University of Minnesota Twin Cities)

Natural images, natural percepts and primary visual cortex

The traditional model of primary visual cortex (V1) is in terms of a retinotopically organized set of spatio-temporal filters. This model has been extraordinarily fruitful, providing explanations of a considerable body of psychophysical and neurophysiological results. It has also produced compelling linkages between natural image statistics, efficient coding theory, and neural responses. However,there is increasing evidence that V1 is doing a whole lot more. We can get insight into early cortical processing by studying not only the relationship between image input and neural activity, but also between human visual percepts and early cortical activity. Natural percepts (in the sense of tapping into natural modes of processing) are as important as understanding natural images when trying to find out what primary visual cortex is doing. I will describe several results from functional magnetic resonance imaging (fMRI) studies which show that human V1 blood oxygenation level dependent (BOLD) response to patterns perceived as well-organized is less than to patterns perceived as less organized, V1 response to natural image contrast is correlated with perceived contrast, and apparent size modulates the spatial extent of V1 activity.

Gisela Klette (University of Auckland)

Branch voxels and junctions in 3D skeletons of confocal microscope images of human brain tissue

Concepts for describing curve points in a continues space are well known in mathematics for a long time. We apply those concepts to the discrete space with the aim to analyse curve-like structures in digital images. For the characterization of 3D skeletons we distinguish between different types of voxels. We discuss approaches to define those elements of skeletons and their properties. We use the distribution and complexity of junctions to extract features for 3D medical images.

Reinhard Klette (University of Auckland)

Panoramic imaging and laser range finders for 3D scene visualization

Joint work with Karsten Scheibe, DLR, Berlin-Adlershof.

The talk informs at a general level about new architectures of panoramic cameras (as designed and produced at DLR, the German Air and Space Institute at Berlin), their use for stereo imaging based on studies at CITR, and the combination of those high-resolution images (about 350 Megapixel each) with range data generated by a laser range finder. Results are illustrated for different objects such as the castle "Neuschwanstein" in Bavaria/Germany.

Jan J. Koenderink (Rijksuniversiteit te Utrecht)

Geometry of panoramic visual space

Jan J. Koenderink and Pietro Perona Friday Short presentations

Jan J. Koenderink (Rijksuniversiteit te Utrecht)

Image texture and the "flow of light"

The "Shading Cue" is conventionally framed in the context of perfectly smooth surfaces. "Shading" has ancient roots in the visual arts, and became canonized in the late 20thc. as the "Shape From Shading (SFS) Problem". I reconsider the problem as conventionally posed, presenting a novel analysis of its "observational basis''. When rough surfaces are considered the image structure is augmented (from mere contrast gradient in the former case) with the image illuminance flow structure revealed by texture. The direction and two differential invariants of this flow can be estimated robustly via the structure tensor. This changes the nature of the "shading cue" qualitatively. Shading alone does not specify surface curvature orthogonal to the illumination direction, a lack of data that has to be made up for by the surface integrability conditions. Hence conventional SFS algorithms are based on partial differential equations with global boundary conditions. Allowing illuminance flow as an additional observable alleviates this problem and purely local, algebraic approaches to SFS become feasible. Algorithms can be shown to exist that derive surface curvature from shading and flow observations through a linear operator applied to the observables, the operator being a function of surface attitude and beam direction. Such an approach neatly reveals the remaining group of ambiguity transformations in an intuitive way. I propose novel ways to deal with the intrinsic ambiguities of photomorphometrics. Instead of attempting to find the full class of equivalent solutions I look for specific solutions given certain a priori guesses. Such methods are much more similar to likely mechanisms of human psychogenesis, in particular visual perception, than the conventional "Marrian" approach. I present methods that boil down to linear, local computation, thus very robust and possibly implementable in neural wetware.

Triet Minh Le (University of California)

Modeling different scales of oscillations in images using generalized functions

1) John B. Garnett (UCLA,
2) Peter W. Jones (Yale,
3) Luminita A. Vese (UCLA,

Natural images have many different scales of oscillations. Texture can be seen as oscillations at smaller scales. Here, we present variational image decomposition models, which decompose different scales of an image f into the sum of two scales u+v. Here u is a piecewise smooth image at a larger scale and v is texture or oscillations at finer scale. We use different spaces of functions or of generalized functions to model different scales in images. The use of generalized functions is motivated by work of Y. Meyer, D. Mumford and B. Gidas. For the piecewise smooth component u, we use the space BV (Bounded Variation), and the generalized homogeneous Besov and Sobolev spaces (B(s,p,q), W(s,p) with s<0, p=1, q=infinity) for the oscillatory component v at finer scale. Thus, to model the v component, we simply impose that K*v belongs to L1. For the Homogeneous Besov case, K is the Poisson or the Heat kernel, and for the homogeneous Sobolev case, K is the kernel which corresponds to the Riesz potential. Once this is realized, implementations of these models can be easily computed. Experimental results applied to natural images will be presented.

Jitendra Malik (University of California)

Natural image statistics enable us to quantitatively model visual grouping and figure-ground cues

Visual grouping and figure-ground discrimination were first studied by the Gestalt school of visual perception nearly a century ago. By the use of cleverly constructed examples, they were able to demonstrate the role of factors such as proximity, similarity, curvilinear continuity and common fate in visual grouping and factors such as convexity, size, and symmetry in figure-ground discrimination. However, this left open (at least) three major problems
(1) there wasn't a precise operationalization of these factors for general images,
(2) the interaction of these cues was ill understood
(3) and there was no justification for why these factors might be helpful to an observer interacting with the visual world.

Over the last few years, we have been pursuing these problems in the following paradigm:
(1) We start with a set of natural images and use human observers to mark the perceptual groups and assign figure-ground labels to the various boundary contours.
(2) We construct computational models of various grouping and figure-ground factors.
(3) We calibrate and optimally combine the grouping and figure-ground factors by using the principle that vision evolved to be adaptive to the statistics of objects in the natural world.

In my talk I will report on two recent results in this paradigm. One is on understanding the power of the figure-ground cues, specifically size, lower-region and convexity. We compared the predictions of such a model with pyschophysics and found a pleasing agreement. The second is an attempt at a unified probabilistic framework for mid-level vision using conditional random fields defined on constrained Delaunay triangulations of image edges.

This talk draws on joint work with Charless Fowlkes, David Martin and Xiaofeng Ren; various papers can be found on the web site

Laurence T. Maloney (New York University)

Surface color perception in three-dimensional scenes: Estimating, representing and discounting the illuminant

Joint work with Katja Doerschner, Huseyin Boyaci.

Researchers studying surface color perception have typically used stimuli that consist of a small number of matte patches (real or simulated) embedded in a plane perpendicular to the line of sight (a Mondrian, Land & McCann, 1971). Reliable estimation of surface properties analogous to color is a difficult if not impossible computational problem in such limited scenes (Maloney, 1999). In more realistic, three-dimensional scenes the problem is not intractable, in part because considerable information about the spatial and spectral distribution of the illumination is usually available. We describe a series of experiments that (1) explore how the human visual system discounts the spatial and spectral distribution of the illumination (SSDI) in judging matte surface color and (2) what cues the visual system uses in estimating the SSDI of in a scene. We find that the human visual system uses information from cast shadows and specular reflections in estimating the SSDI and, when more than one cue type is present, combines these cues effectively. The SSDI can be very complex in scenes with many different light sources. We examine (3) the limits of human visual representation of the SSDI, reporting an experiment intended to tests these limits. Our results indicate that the human visual representation of the SSD of the illumination in a scene is well- matched to the task of perception of matte surface color perception.

Land, E. H. & McCann, J. J. (1971), Lightness and retinex theory. Journal of the Optical Society of America, 61,1-11. Maloney, L. T. (1999), Physics-based approaches to modeling surface color perception. In Gegenfurtner, K. R., & Sharpe, L. T. [Eds] (1999), Color Vision: From Genes to Perception. Cambridge, UK: Cambridge University Press, pp. 387-422.

Pietro Perona (California Institute of Technology)

What you can see at a blink

Jan J. Koenderink and Pietro Perona Friday Short presentations

Pietro Perona (California Institute of Technology)

How many categories can you recognize?

How many categories can you recognize? Currently the best estimate is due to Irv Biederman: 3000 entry-level categories and perhaps 3*104 categories overall. This estimate was obtained indirectly, by counting words in a dictionary. I will present a method to obtain a direct estimate. Alongside the estimate one gets frequencies of objects and categories for free. I will discuss the implications for visual recognition and other visual problems.

Walter Richardson Jr. (University of Texas)

Modeling superconductivity via Ginzburg-Landau

The Time Dependent Ginzburg-Landau equations describe the evolution of the order parameter and vector magnetic potential in a superconductor, giving the density of electrons in the superconducting phase. There are connections between TDGL and the variational/PDE equations of image processing. We present an overview of numerical modeling of a superconductor via TDGL with a focus on how the mesh for the spatial discretization can have a profound effect on simulation results. An inadequate mesh resolution will often give rise to spurious solutions which seem physically correct, but are false. The effects of different physical parameters and boundary conditions in 2D and 3D are also presented.

Guillermo R. Sapiro (University of Minnesota Twin Cities)

Texture mixing via universal simulation

Joint work with G. Brown (UofM and HP Labs) and G. Seroussi (MSRI).

A framework for studying texture in general, and for texture mixing in particular, is presented in this work. The work follows concepts from universal type classes and universal simulation. Based on the well-known Lempel and Ziv (LZ) universal compression scheme, the universal type class of a one dimensional sequence is defined as the set of possible sequences of the same length which produce the same dictionary (or parsing tree) with the classical LZ incremental parsing algorithm. Universal simulation is realized by sampling uniformly from the universal type class, which can be efficiently implemented. Starting with a source texture image, we use universal simulation to synthesize new textures that have, asymptotically, the same statistics of any order as the source texture, yet have as much uncertainty as possible, in the sense that they are sampled from the broadest pool of possible sequences that comply with the statistical constraint. When considering two or more textures, a parsing tree is constructed for each one, and samples from the trees are randomly interleaved according to pre-defined proportions, thus obtaining a mixed texture. As with single texture synthesis, the k-th order statistics of this mixture, for any k, asymptotically approach the weighted mixture of the k-th order statistics of each individual texture used in the mixing. We present the underlying principles of universal types, universal simulation, and their extensions and application to mixing two or more textures with pre-defined proportions.

Mao-Pei Tsui (University of Toledo) www.math.utoledo/~mtsui

Physics-motivated features for distinguishing photographic images and computer graphics

Joint work with Tian-Tsong Ng, Shih-Fu Chang Jessie Hsu, Lexing Xie (Columbia University).

The increasing photorealism for computer graphics has made computer graphics a convincing form of image forgery. Therefore, classifying photographic images and photorealistic computer graphics has become an important problem for image forgery detection. We propose a new geometry based image model, motivated by the physical image generation process, to tackle the above-mentioned problem. The proposed model reveals certain physical differences between the two image categories, such as the gamma correction in photographic images and the sharp structures in computer graphics. For the problem of image forgery detection, we propose two levels of image authenticity definition, i.e., imaging-process authenticity and scene authenticity, and analyze our technique against these definitions. Such definition is important for making the concept of image authenticity computable. Apart from offering physical insights, our technique with a classification accuracy of 83.5% outperforms those in the prior work, i.e., wavelet features at 80.3% and cartoon features at 71.0%. We also consider a recapturing attack scenario and propose a counter-attack measure.

Michael Wakin (Rice University)

Manifold-based models for image processing

The information contained in an image ("What does the image represent?") also has a geometric interpretation ("Where does the image reside in the ambient signal space?"). It is often enlightening to consider this geometry in order to better understand the processes governing the specification, discrimination, or understanding of an image. We discuss manifold-based models for image processing imposed, for example, by the geometric regularity of objects in images. We present an application in image compression, where we see sharper images coded at lower bitrates thanks to an atomic dictionary designed to capture the low-dimensional geometry. We also discuss applications in computer vision, where we face a surprising barrier -- the image manifolds arising in many interesting situations are in fact nondifferentiable. Although this appears to complicate the process of parameter estimation, we identify a multiscale tangent structure to these manifolds that permits a coarse-to-fine Newton method. Finally, we discuss applications in the emerging field of Compressed Sensing, where in certain cases a manifold model can supplant sparsity as the key for image recovery from incomplete information.

This is joint work with Justin Romberg, David Donoho, Hyeokho Choi, and Richard Baraniuk.

Todd Wittman (University of Minnesota Twin Cities)

A variational approach to image and video super-resolution

Super-resolution seeks to produce a high-resolution image from a set of low-resolution, possibly noisy, images such as in a video sequence. We present a method for combining data from multiple images using the Total Variation (TV) and Mumford-Shah functionals. We discuss the problem of sub-pixel image registration and its effect on the final result.