February 6 - 10, 2006
Stereo correlation techniques used to automatically compute DEMs
(Digital Elevation Models) by photogrammetry typically use stereo pairs
with a relatively high baseline/height ratio, in order to diminish the
relative importance of the adhesion phenomenon, which is a distortion of
the model that appears near strong discontinuities or borders of the
image. This phenomenon is directly related to the correlation process,
and the magnitudes of the artifacts cannot be neglected when trying to
obtain sub-pixel accuracies.
Correctly modelling and correcting this bias will allow the use of
stereo pairs with much lower b/h ratios, which has the great advantage
of avoiding many problems due to the occluded parts in the image.
The work by Delon and Rougé characterizes this phenomenon, giving a link
between measured and true disparities, and allowing detection of
uncorrelatable regions (or regions providing no useful information for
correlation). Since this leads to a very ill posed system of equations,
many simplifying assumptions have been adopted in order to easily solve
it, leading to the so called barycentric correction of the adhesion
phenomenon. Even though the result is highly improved with respect to
the raw correlation disparities, one still observes a slightly blurred
disparity map, which is specially annoying in urban areas.
In this work we propose more precise and natural assumptions to solve
this system, namely to regularize the solution by a modified minimal
surface term. Such an approach is naturally expected to allow less
blurred edges while still filling in empty areas (without meaningful
correlation information) in a reasonable manner.
Our future research will explore the extension of these techniques to
motion estimation in image sequences, and its application (in
conjunction with our irregular sampling work) to multi-frame
We develop an object classification method that can learn
a novel class from a single training example. In this method,
experience with already learned classes is used to
facilitate the learning of novel classes. Our classification
scheme employs features that discriminate between class and non-class
images. For a novel class, new features are derived by
selecting features that proved useful for already learned
classification tasks, and adapting these features to the new
classification task. This adaptation is performed by replacing
the features from already learned classes with similar features
taken from the novel class. A single
example of a novel class is sufficient to perform feature
adaptation and achieve useful classification performance.
Experiments demonstrate that the proposed algorithm can learn a
novel class from a single training example, using 10 additional
familiar classes. The performance is significantly improved
compared to using no feature adaptation. The robustness of the
proposed feature adaptation concept is demonstrated by similar
performance gains across 107 widely varying object categories.
Antique documents such as photographic prints and books can be
affected by several kinds of artefacts: foxing/yellowing, water
blotches, fragmented glass plate, screening, etc. Each specific
"problem" can be attacked by using advanced algorithms able to recover
the original appearance. In this work a brief review of our solutions
for virtual restoration is reported. Also some visual example will be
depicted just to verify the effectiveness of the proposed approaches.
Starting in 2002, Universitat Pompeu Fabra (Barcelona, Spain) has been a
partner in several Digital Cinema projects for the European Union,
involving major European companies. Within this framework, our Image
Processing Group has developed several novel algorithms for digital
cinema postproduction and exhibition. These works include:
-A Day for Night algorithm that accurately models human visual
perception regarding color and contrast modification, but also loss of
accuity through a novel anisotropic diffusion Partial Differential Equation.
-A Depth of Field algorithm that performs real-time, accurate depth of
field simulation by running an anisotropic diffusion equation on a
programmable graphics card.
-A robust tracking algorithm that improves the Geodesic Active Regions
-A fast and robust segmentation algorithm based on the Tree of Shapes
-An Interlaced to Progressive Conversion algorithm that achieves
real-time, state of the art results on a regular PC by implementing a
variational energy minimization approach on a graphics card.
A suitable detection and tracking approach is proposed for line scratch
removal in a digital film restoration process. Unlike impulsive distortions
such as dirt spots, which appear randomly in an image, line scratch
artifacts persist across several frames. Hence, motion compensated methods
will fail for persistent line scratches. Single-frame based methods will
also fail if scratches are unsteady or fragmented. The proposed method uses
as input a composite image built up from projections of each image of the
original sequence. First, a simple 1D-extrema detector provides line scratch
candidates for both bright and dark scratches. Next, a MHT (Multiple
Hypothesis Tracker) stage uses these candidates to create and keep multiple
hypothesis. As the tracking goes further through the sequence, each
hypothesis gains or looses evidence. To avoid a combinatorial explosion, the
hypothesis tree is sequentially pruned. As hypothesis are set up at each
iteration, even if no information is available, a tracked path might cross
gaps (missed detection or speckled scratches). The tracking stage then feeds
the correction process with valid scratch trajectories.
Bayesian denoising of archival film requires a likelihood model that
captures the image noise and a spatial prior that captures the
statistics of natural scenes. For the former we learn a statistical
model of film noise that varies as a function of image brightness. For
the latter we use the recently proposed Field-of-Experts framework to
learn a generic image prior that capture the statistics of natural
scenes. The approach extends traditional Markov Random Field (MRF)
models by learning potential functions over extended pixel
neighborhoods. Field potentials are modeled using a Products-of-Experts
framework that exploits non-linear functions of many linear filter
responses. In contrast to previous MRF approaches all parameters,
including the linear filters themselves, are learned from training data.
The prior model alone can be used to inpaint missing image structures
and the data noise model can be used to simulate realistic film grain.
Additionally we demonstrate how the learned likelihood and prior models
can be used to denoise archival film footage.
Joint work with Stefan Roth and Teodor Moldovan.
The state of the art movie restoration methods compensate the motion by an optical flow estimate and then filter out the compensated
movie. Now, the motion estimation problem is fundamentally ill-posed. This fact is known as the aperture problem: trajectories are
ambiguous since they could coincide with any promenade in the space-time isophote surface.
In this talk, we show that the aperture problem can be taken advantage of. This observation leads to use for movies the recently
introduced NL-means algorithm. This static 3D algorithm involves the whole movie isophote and not just a trajectory.
The twin technologies of camera tracking and motion capture are key
components in the modern movie production pipeline, without which such
effects-laden productions as "Revenge of the Sith" and "The Lord of
the Rings" simply would not be possible. Recent advances have been
driven by the successful application of algorithms developed in the
computer vision research community to these real-world problems. The
resulting highly automated, robust software solutions have greatly
reduced the time and level of specialist skill required of the
operator, hence reducing the overall costs of camera tracking and
motion capture. Consequently, these technologies are now commonly
being used in much lower budget productions such as television
advertising, music promos and video games. I will talk about the
underlying problems involved in camera tracking and motion capture,
and will illustrate the modern approaches to them using 2d3's "boujou"
camera tracker and ViconPeak's "IQ" motion capture software.
An art form barely 100 years old has seen the evolution of film
grammar as new technologies emerge. Just as paintings changed when
oil paint was overtaken by acrylic paint, motion pictures have seen
three systemic technological changes: silent film to sound, black
and white film to color, photochemical to digital. Each of these
changes affected how filmmakers told stories.
This talk - with many film clip samples - will attempt to give an
overview of post-production technology with special emphasis on how
the movement from photochemical to digital has affected the film
editing process. In addition, the speaker, a film editor currently
working in Hollywood, will describe his fifteen year adventure
developing digital tools for film restoration.
Image flicker is a general film effect, which can be observed in low sampled videos as well as in old films, and consists of fast
variations of the frame contrast and brightness. Reducing flicker of a sequence improves its visual quality and can be an essential first
treatment before ulterior manipulations. An axiomatic analysis of the problem leads to a global and generic method of "de-flicker",
based on scale-space theory. As a global process, this correction is robust to global noise, shaking and small motion. The scale-time
framework leads to simple results of stability, ensures the robustness of the method to blotches or impulse noise, and guarantees that no
bias or deviation can appear in time. In cases of a flicker mixing both very local changes and global oscillations, this process can
still be used as a first step of deflickering before a more local treatment.
I shall talk about building 3D models from image sequences, and in
particular about rendering new views of existing sequences in order
to create stereoscopic 3D from monocular footage. I shall show how
existing strategies for image-based rendering can be augmented using
image-based priors to create realistic 3D views. In addition I will
talk about the difficult problem of creating 3D when there is no
Currently lots of old celluloid movies are digitized to save them from
decay. Most of the footage has already suffered from aging or from
abrasion and should be processed to improve its quality. In our work we
focus on removing scratches and blotches resulting from mechanical
damage to the celluloid layer. Bearing in mind that even movies of
moderate length consist of several thousands of frames we face two main
problems. First, manually highlighting the corrupted pixels is not
feasible, they have to be detected automatically. Second, processing
time should be kept low.
We employ a method based on the optical flow for detection and removal
of scratches. Where this method fails we apply a hybrid still image
inpainting technique, utilizing PDE inpainting and texture synthesis
methods. Due to the use of efficient numerical algorithms an optimized
implementation can achieve a processing time of a few ten seconds per
frame. Further the algorithm is highly parallelizable except for a
We propose an algorithm to solve a problem in image restoration
which considers several different aspects of it, namely: irregular
sampling, denoising, deconvolution, and zooming. Our algorithm is
based on an extension of a previous image denoising algorithm
proposed by A. Chambolle using total variation,
combined with irregular to regular sampling algorithms proposed by
H.G. Feichtinger, K. Gröchenig, M. Rauth and T. Strohmer. Finally we
present some experimental
results and we compare them with those obtained with the algorithm
proposed by K. Gröchenig et al.
Joint work with A. Almansa, V. Caselles and B. Rouge.
This is a moderated session. If anyone has
media they can share in advance with other participants, so they can try
things ahead of time, the purpose would be to structure a bit fast
problem demos on really specific things, providing a context as well to
bridge discussions between commercial tools and research. So, in some way,
present in 5 minutes or less to a room of experts a real, specific problem.
Additional information and samples available at
The European project PrestoSpace was started on February 2004. The project aims to provide a complete solution to preserve audiovisual material found in archives (e.g. BBC, RAI or INA). We are presenting a general overview of this project and focus on the restoration task by presenting all partners research topic. Finally, we will present more deeply Joanneum Research activities inside this project.
Joint with Bill Collis (The Foundry).
The need for the design of new image manipulation tools for both consumer and
professional postproduction has substantially widened the breath of research in
image/video/vision processing. While machine vision tools have been used
successfully by industry for many years, it is only with the success of Digital
Television and Digital Media Streaming that more sophisticated moving image
processing has shown mainstream success in the worldwide community. This talk
tries to chart a course showing how tools in restoration that have been
considered for over a decade now, have migrated into the post-production
community where they have metamorphosed into other applications. It highlights
some emerging trends and tries to explore why researchers and post-production
industrialists have become friends.
Joint work with Sorin Tilie (INA) and
Isabelle Bloch (ENST).
This paper proposes a method based on the
Dempster-Shafer evidence theory for the detection of
blotches in digitized archive film sequences. The
detection scheme relies on the fusion of two
uncorrelated fast but inaccurate, spatio-temporal
The imprecision and uncertainty of both detector
are modeled using Dempster-Shafer evidence theory,
which improves the decision, by taking into account
the ignorance and the conflict between detectors.
We found that this combination scheme improves the
performance of single blotch detectors, and compares
favorably to more complex and time consuming blotch
detection methods, for real archive film sequences.
We present a new model for image blending based on warping. The model
is represented by partial differential equations (PDEs) and gives a sequence of images,
which has the properties of both blending of image intensities and warping of image
shapes. We modified the energy functional in the work by Liao et. al. (2002) in order to
adapt the idea of the shape warping to the image blending. The PDEs from the proposed
energy functional cover not only overlapped images but also non-overlapped ones.
We consider the problem of detecting and removing line scratches from
digital image sequences.
In particular, we present an approach based on data fusion techniques for
combining relatively well settled distinct techniques. Moreover, focusing on
blue scratches, we describe a detection method and a removal method that
strongly rely on the specific features of such scratches.
Evaluation of the proposed methods and numerical experiments on real images
Motion picture restoration spans the gamut from archival preservation of
historically and culturally significant works to pragmatic treatment of
low budget titles to extensive polishing of today's blockbusters. Each
restoration project has its own idiosyncrasies, including
original storage technology, type of damage, and final delivery requirements.
Each project needs to strike its own balance between speed and accuracy of
processing. Restoration must be approached in a way that addresses the
peculiarities and considerations unique to the material.
This talk presents some of the prototypical challenges encountered in
motion picture restoration. The evolution and consequences of several
prevalent storage technologies are discussed. Various sources of image
degradation and damage---both common and unusual---are demonstrated.
State-of-the-art restorations techniques are presented and critiqued. The
requirements and properties of modern delivery mechanisms are explained.
Techniques employed in film editing have evolved rapidly with
increasingly sophisticated and complex methods being used to enhance
storytelling. This talk will examine the relationship between scene and
shot, picture and sound with a discussion of how an understanding of
editing technique can be leveraged to enhance the automated analysis of
About the Speaker:
John Mateer joined the University of York in 2001 specifically to design
and develop the media production and analysis components of an
innovative new teaching and research initiative in Media Technology.
His expertise lies in the integration and application of new media
technologies in different traditional media production contexts. Prior
to this appointment, he worked for over 15 years as a producer, director
and production consultant. He is a graduate of NYU's Tisch School of
the Arts and AFI's Center for Advanced Film and Television Studies, and
is an active member of the Directors Guild of Great Britain.
Imagineer Systems Ltd was founded in 2000 with the aim of building
innovative products based around computer vision technology. Our first
products mokey, has helped to automate various important tasks in film
and video post-production, including wire and rig removal,
stabilisation, lens distortion correction and matte creation. Two new
products, monet and motor, are specialised to compositing and
rotoscoping applications. Our core technology is a fast and accurate
tracker for affine and projective 2D motion.
In my seminar I shall relate some of the history of the company and
summarise the algorithms and software we have developed, in particular
our "Gandalf" computer vision library (see gandalf-library.sf.net).
There will be extensive demonstrations of mokey and monet. If time
permits, I will present a mathematical conundrum in the area of the
normalisation of projective quantitities.
In this work a software framework for processing video data integrating
existing open source libraries and a set of applications for video content
analysis is presented. These are partial results of an ongoing project.
Due to the huge amount of data contained in video sequences, a set of
constraints were considered while designing the system to keep the
computational cost and memory requirements as low as possible, which led
to a simple and effective system architecture. The system also includes a
MPEG7-like description module that allows extracting and storing a video
Based on this system a set of applications for object extraction, shot
detection, and content analysis were developed. These applications were
used to test the developed software and to develop new solutions. Novel
results for object extraction, shot detection and content description are
Film based media becomes unstable over time, unless they are stored at low
temperatures and the humidity is controlled. Some defects, such as bleaching,
are difficult to solve using photochemical restoration methods; in such cases,
digital restoration can be an alternative solution. The basic idea of the
proposed work is to mimic the robust capabilities of the human vision system
(HVS) to set up a tool to filter damaged frames in a partially automated way.
In fact, film colour cast, caused by ageing, can be considered as generic
chromatic noise, thus a colour constancy method can be suitable for restoring
it. Moreover a colour constancy method inspired by the HVS behaviour does not
need any a-priori information about the colour cast and its magnitude. Another
advantage of HVS inspired algorithms is their local effect since film chemical
deterioration is usually non-uniform. Several test have been performed with an
algorithm called ACE (Automatic Colour Equalization). The technique, presented
here, is not just an application of ACE on movie images, but also an
enhancement of ACE principles to meet the requirements of digital film
restoration practice. The basic ACE computation extracts autonomously the
visual content of the frame, correcting colour cast if present and expanding
its dynamic range. This behaviour is not always a good restoring solution:
there are cases in which the cast has to be maintained (e.g. underwater shots)
or the dynamic range has not to be expanded (e.g. sunset or night shots). To
this aim, new functions have been added to preserve the natural histogram
shape, adding new efficacy in the restoration process. Examples are presented
to discuss characteristics, advantages and limits of the use of perceptual
models in digital movies colour restoration.
For all the discussion of right brain/left brain conflict, mathematics and
the arts actually have a very healthy relationship, one which can perhaps be
traced to their common goal of finding a way to give expression to the grand
truths of experience. Mathematician Dan Rockmore will take us on a tour of
modern art history and point out some of the surprising ways in which
mathematical ideas have been and continue to be an enabler as well as
inspiration for some of the big ideas in the visual arts.
Dan Rockmore is a Professor of Mathematics and Computer Science at Dartmouth
College, where he has taught since 1991. He recently published Stalking the
Riemann Hypothesis : The Quest to Find the Hidden Law of Prime Numbers.
The world of motion picture production and restoration is populated by artists and business people; not mathematicians. Yet, as more and more digital processes, many of which implement non-deterministic algorithms, are introduced into the industry, the probability of error at any stage has increased while the probability of identification and correction has decreased. In addition, many people who came into the industry when silver halide film and photochemistry were the only tools, lack the understanding of digital tools of people newer to the industry and vice versa.
I will discuss in detail the mathematical and social solutions developed to solve problems in principal photography and post production related to digital capture defect correction, film image quality control and work flow. I will also detail how a thorough understanding of traditional photochemical film processes allows for the creation of processes with an optimal combination of mathematics, art, analog and digital while, at the same time, educating cinematographers, directors and executives in how to recognize and overcome the complexities of these processes.
In this talk we will review basic techniques for image inpainting
and present new ones for video inpainting under
constrained camera motion.
Image-based rendering has been one of the hottest areas in computer
graphics in recent years. Instead of using CAD and painting tools to
construct graphics models by hand, IBR uses real-world imagery to
rapidly create extremely photorealistic shape and appearance
models. However, IBR results to date have mostly been restricted to
static objects and scenes.
Video-based rendering brings the same kind of realism to computer
animation, using video instead of still images as the source
material. Examples of VBR include facial animation from sample video,
repetitive video textures that can be used to animate still scenes and
photos, 3D environment walkthroughs built from panoramic video, and 3D
video constructed from multiple synchronized cameras. In this talk, I
survey a number of such systems developed by our group and by others,
and suggest how this kind of approach has the potential to fundamentally
transform the production (and consumption) of interactive visual media.
About the Speaker
Richard Szeliski leads the Interactive Visual Media Group at Microsoft
Research, which does research in digital and computational photography,
video scene analysis, 3-D computer vision, and image-based rendering. He
received a Ph.D. degree in Computer Science from Carnegie Mellon
University, Pittsburgh, in 1988. He joined Microsoft Research in
1995. Prior to Microsoft, he worked at Bell-Northern Research,
Schlumberger Palo Alto Research, the Artificial Intelligence Center of
SRI International, and the Cambridge Research Lab of Digital Equipment
Dr. Szeliski has published over 100 research papers in computer vision,
computer graphics, medical imaging, and neural nets, as well as the book
Bayesian Modeling of Uncertainty in Low-Level Vision. He was a Program
Committee Chair for ICCV'2001 and the 1999 Vision Algorithms Workshop,
served as an Associate Editor of the IEEE Transactions on Pattern
Analysis and Machine Intelligence and on the Editorial Board of the
International Journal of Computer Vision, and is a Founding Editor of
Foundations and Trends in Computer Graphics and Vision.
I shall talk about some practical problems of special effects /
animation production that could be broadly defined as Inverse Problems:
data is recovered from observed images rather than generated on user
specification. Most of Computer Vision tasks in movie production,
opposite to most Computer Graphics tasks, fall in Inverse Problems
category. The problems I will speak of are: rigid body surveyless,
articulated body (as hierarchy of rigid objects), face and flexible
surface tracking (markers based, markerless, featureless) and
photomodelling including what is practically important for our industry
in algorithms development and our expectations. Automatic 3d terrain
generation from monocular camera image sequence will be presented as an
example of discussed problems and solutions.
Super-resolution seeks to produce a high-resolution image from a set of
low-resolution, possibly noisy, images such as in a video sequence. We
present a method for combining data from multiple images using the Total
Variation (TV) and Mumford-Shah functionals. We discuss the problem of
sub-pixel image registration and its effect on the final result.
In film editing, or in exploring large video archives, there is a need
to access shots by their visual content directly, as textual
annotations may not be available. In the first part of this talk I
will describe an approach to searching for and localizing all the
occurrences of an object in a video. The object is represented by a
set of viewpoint invariant fragments that enable recognition to
proceed successfully despite changes in viewpoint, illumination and
partial occlusion. The fragments act as "visual words" for describing
the scene, and by pushing this analogy efficient methods from text
retrieval can be employed to retrieve shots in the manner of a Google
search of the web.
In the second part of the talk I'll describe progress in searching for
people in videos by matching their face. Face recognition is a
challenging problem because changes in pose, illumination and
expression can exceed those due to identity. Fortunately, video
enables multiple examples of each person to be associated
automatically using straightforward visual tracking. We demonstratehow these multiple examples can be harnessed to reduce the ambiguity
The methods will be demonstrated on several feature length films.
This is joint work with Josef Sivic and Mark Everingham.