Retrieving Video of People and Places
to access shots by their visual content directly, as textual
annotations may not be available. In the first part of this talk I
will describe an approach to searching for and localizing all the
occurrences of an object in a video. The object is represented by a
set of viewpoint invariant fragments that enable recognition to
proceed successfully despite changes in viewpoint, illumination and
partial occlusion. The fragments act as visual words for describing
the scene, and by pushing this analogy efficient methods from text
retrieval can be employed to retrieve shots in the manner of a Google
search of the web.
In the second part of the talk I'll describe progress in searching for
people in videos by matching their face. Face recognition is a
challenging problem because changes in pose, illumination and
expression can exceed those due to identity. Fortunately, video
enables multiple examples of each person to be associated
automatically using straightforward visual tracking. We demonstratehow these multiple examples can be harnessed to reduce the ambiguity
The methods will be demonstrated on several feature length films.
This is joint work with Josef Sivic and Mark Everingham.