Scalable Extrinsic Registration of Omni-Directional Image Networks

Wednesday, May 16, 2001 - 3:30pm - 4:30pm
Keller 3-180
Matthew Antone (Massachusetts Institute of Technology)
We describe linear-time algorithms for recovering scene-relative camera orientations and positions in networks of thousands of terrestrial images spanning hundreds of meters, in outdoor urban scenes, under uncontrolled lighting. Accurate registration of such image networks is currently infeasible by any other means, manual or algorithmic. Our system requires no human input or interaction, and recovers 6-DOF camera pose which is globally consistent on average to roughly $0.1^{circ}$ and five centimeters, or about four pixels of epipolar alignment---sufficiently accurate for applications such as 3D reconstruction and image-based rendering.

The 6-DOF registration problem is decoupled into pure rotation and translation components, which take accurate intrinsic parameters, approximate extrinsic pose, and a connected camera adjacency graph as input. The algorithms estimate a local coordinate frame at each camera by classifying and combining thousands of low-level image features (lines and points) into a few robust aggregate features (vanishing points and motion baselines). These local frames are then propagated and registered through the adjacency graph. As output, the algorithms produce an accurate assignment of absolute orientation and position, and associated uncertainty estimates, to every camera.

Our principal contributions include multi-camera probabilistic extensions of classical two-camera alignment methods; new uses of the Hough transform for initialization of iterative numerical techniques; and formulation of expectation maximization algorithms for the recovery of camera pose without explicit feature correspondence. We also extend existing stochastic frameworks to handle unknown numbers of 3-D feature points, unknown occlusion, large scale (thousands of images and hundreds of thousands of features), and large dimensional extent (inter-camera baselines of tens of meters spanning areas hundreds of meters across). Finally, we introduce principled methods for the estimation and propagation of projective uncertainty, and present strong quantitative evidence of the superior utility of wide-FOV images in extrinsic calibration.

We assess the system's performance on synthetic and real data, and draw several conclusions. First, by fusing thousands of gradient-based image features into a few ensemble projective features, the algorithms achieve accurate registration even in the face of significant lighting variations, low-level feature noise, and error in initial pose estimates. Second, we show that registration of wide-FOV images is fundamentally more robust against failure, and more accurate, than is registration of ordinary imagery. Finally, the system surmounts the usual tradeoff between speed and precision: it is both faster and more accurate than manual bundle adjustment.
(Joint work with Seth Teller).