Talk
Abstracts:
Material
from Talks
Jont
Allen
(AT&T Labs-Research)
The
Intensity JND Comes From Poisson Neural Noise: Implications
for Image Coding pdf
(236KB) postscript.gz
(147KB)
Shi-Kuo
Chang
(University of Pittsburgh) chang@cs.pitt.edu
Sentient
Map - A Novel Interface for Digital
Libraries Slides
The
sentient map is a new paradigm for visual information retrieval.
It enabl es the user to view data as maps, so that gestures,
more specifically c-gestures, can be used for the interaction
between the user and the multimedia information system. Different
c-gestures are then dynamically transformed into spatial/temporal
queries, or Sigma-queries, for multimedia information sources
and databases. An e-learning environment involving many academic
inst itutions serves as a test bed to evaluate this approach.
Application to digital library is discussed.
Nicholas
Coult
(Department of Mathematics, Augsburg College) coult@augsburg.edu
Compression
and Region-of-Interest Extraction for Large Incomplete Data
Sets
In
this talk I will present an algorithm for the compression of
incomplete volumetric data sets. As is often the case with scientific
or other real-world data, one must handle three-dimensional
volumes of data in which part of the data is missing or invalid;
I refer to such data as incomplete. Standard methods for compression
of real data would normally perform poorly in this situation.
The algorithm I present introduces an extra step of processing
to improve the compression performance. This algorithm also
has the property that after compression, one may extract a small
subset of the data from the compressed file without having to
uncompress the whole file simultaneously, an obvious advantage
for data sets that are too large too fit in main memory.
Michelle
Effros
(California Institute of Technology) effros@mite.z.caltech.edu
Network
Source Codes in Digital Libraries
Traditional
data compression algorithms or source codes were developed for
"point-to-point" communications environments, where a single
transmitter sends information through space (data communication)
or time (data storage) to a single receiver. While direct application
of these algorithms to digital libraries is possible and, in
fact, pervasive, this approach ignores the multi-user, multi-application
nature of many types of datastores. As a result, applying traditional
source coding techniques in multi-user digital libraries can
limit system functionality and lead to extreme inefficiencies
in bandwidth and storage space. Network source codes generalize
traditional data compression methods from the point-to-point
communication scenario to more general "multi-point" network
environments. This talk includes a brief introduction to the
theory and practice of network source coding, focusing on potential
applications for network source codes in digital libraries.
Special cases of network source codes include multi-resolution,
multiple description, multiple access, and broadcast system
source codes. The proposed methods may be applied to achieve
greater versatility, robustness, and efficiency in digital libraries.
Arif
Ghafoor (Purdue
University)
Multimedia
Database Management Systems: A Perspective
Emerging
multimedia information technologies will allow users to store,
retrieve, share, and manipulate complex information which are
expected be used for building exciting new applications. Development
of such applications to a mature and fruitful stage will require
substantial changes in our approach to the design of operating
systems, databases and storage systems. Most multimedia applications
will use some form of pre-orchestrated stored information. Such
applications include telemedicine, digital libraries, virtual
reality, on-line education and training, CAD/CAM, etc. The stored
nature of the information poses a number of challenges and can
allow novel techniques in the management and representation
of multimedia data and knowledge. The key challenges are related
to data organization and integration, indexing and retrieval
mechanisms, intelligent searching techniques, information browsing,
content-based query processing, handling of heterogeneity and
variety of multimedia data, and so forth.
In
this talk, we highlight these challenges and provide an assessment
of the state-of-the-art in developing large scale multimedia
database systems. Several case studies of industrial and research
projects are presented to evaluate the efficacy of the theoretical
foundation of this area.
Robert
Gray (Stanford University)
Gauss
Mixture Vector Quatization for Compression, Classification,
and Modeling
The
"worst case'' attribute of Gaussian vectors for data compression/source
coding originally developed by Sakrison and Lapidoth using Shannon
rate-distortion theory is developed using the high rate quantization
theory of Bennett, Zador, and Gersho and extended to Gauss mixtures,
providing an approach to robust data compression for nonGaussian
sources such as images. The analysis provides several interesting
side results, including a new interpretation of the minimum
discrimination information distortion (MDI) measure and its
application to clustering models and constructing Gauss mixture
models based on training data. High rate quantization theory
provides a mathematical connection between the distortion and
the performance of a classified vector quantizer for nonGaussian
data designed using Gaussian distributions. Although the primary
application is compression and classification, several ideas
relating maximum entropy estimation of probability densities,
the MAXDET problem, and Markov mesh random fields arise in the
analysis. The theory provides a hindsight explanation for why
CELP speech coders work as well as they do.
Sheila
S. Hemami (Visual
Communications Lab, Cornell University School of Electrical
Engineering) hemami@CS.Cornell.EDU
Perception
of Extremely Low-Rate Images and Video: Psychophysical Evaluations
and Analyses
Despite
the seemingly constant increases in available bandwidth, images
and video encoded at low rates are currently and will continue
to be commonly used for visual communication. Time and expense
are two motivations for the use of low-rate encoded content,
and for some access modes such as 3G wireless systems, only
low-rate encoded images and video can be transmitted. To maximize
the access to digitally archived information, low-rate-encoded
versions of the content should be made available. At such rates,
compression artifacts are clearly visible. To maximize perceived
quality, characteristics of the human visual system (HVS) should
be considered. Perceptually motivated image compression research
has focused on producing visually lossless images, and very
little work has been done on understanding perception of low-rate
video. This talk will describe our recent work on characterizing
human perception of low-rate encoded images and video, and will
discuss applications to compression.
Alfred
Hero (University
of Michigan)
Divergence
Matching Criteria for Registration, Indexing and Retrieval
We
review, motivate and apply the Renyi divergence measure for
classification and detection tasks arising in database matching.
The Renyi divergence is a generalization of the Kullback-liebler
divergence and the Hellinger distance for measuring differences
between multivariate probability densities. This divergence
measure is motivated directly from Chernoff's theorem on the
asymptotic probability of error rate of the optimal discrimanant.
To be applied to image registration, indexing, or retrieval
the Renyi divergence must be estimated from the data. There
are two ways to accomplish this: 1) non-parametric density estimation;
2) minimal graph matching via minimal spanning trees or other
quasi-additive optimal graph structure. We will present both
theoretical theoretical results and implementations of Renyi
matching criteria for a range of problems relevant to searching
databases.
James Johnstpon (AT&T
Labs)
Perceptual coding - Interactions Between Models and Coders
This
talk will provide a tutorial on perceptual audio coders, as
well as introduce the basics of some perceptual image coders.
In the course of explaining the basics behind the perceptual
coder a number of problems arise in fitting together the lossy,
non-linear perceptual process with the generally linear, LMS
coding process, with its attached information-theoretic and
computer-science like entropy coding and solution methods. A
number of these interactions will be pointed out, from filterbank
fitting vs. perceptual models, to quantization strategy vs.
error loudness. Some ad-hoc solutions will be shown, and some
open questions will be mentioned. To date, there has been very
little work in actually determining an 'optimum' to such problems,
rather work on finding something that works, that can approximate
the optimum, rather than finding an "optimum". Perhaps this
work will be mature when, given a perceptual model and a (set
of) filterbanks, quantizers, etc, one can produce evidence of
a true optimum.
Don H. Johnson
(Computer & Information Technology Institute, Department
of Electrical & Computer Information, Rice University, Houston,
Texas) dhj@rice.edu
A
Theory of Information Processing
Signal processing concerns designing algorithms for manipulating
signals to best achieve some signal-based criterion. Information
theory focuses on how best to communicate and represent digital
streams. What is missing from both is a consideration of the
information represented by signals no matter what their form.
In particular, multimedia data streams are quite complex, varied
and highly interdependent. By melding aspects of signal processing
and information theory, information processing theory quantifies
how well signals represent information regardless of their form
and how well systems process (extract information components
at the expense of others) and extract information.
Sharad
Mehrotra (University of California at Irvine)
Clustering
and Indexing of Multimedia Objects in the MARS System
The
goal of the MARS project is the design and development of next
generation information systems that provides seamless access
to multimedia information based on its rich internal content.
Due to many fundamental limitations of retrieving multimedia
information based solely on textual annotations, we have adopted
a vision centric approach in which objects are represented and
retrieved based on low-level visual features (e.g., color, texture,
layout, etc). These visual properties may be extracted automatically
from images/video making the approach scalable to large as well
as heterogeneous multimedia collections.
Supporting
content-based queries over visual feature representations poses
many significant challenges to existing practice of database
management (DBMS) and information retrieval (IR). Existing IR
techniques that deal primarily with textual information need
to be generalized to support content-based retrieval over multimedia.
Furthermore, since visual feature representations define complex
non-euclidean vector spaces, techniques need to be developed
to support such complex multidimensional information in DBMSs.
Another challenge is to integrate multimedia IR techniques with
DBMSs. Problem arises since existing DBMSs do not have any native
support for storage and processing of imprecise information
while content-based retrieval is inherently imprecise.
In
this talk, I will provide an overview of the progress we have
made in addressing some of the above challenges in supporting
multimedia information in DBMSs. The focus of the talk will
be on the problem of indexing and efficient retrieval of multimedia
objects (viz., the dimensionality curse problem, support for
arbitrary distance metrics, support for novel types of queries
including refined queries in databases) and the solutions developed
in the context of MARS.
Michael
T. Orchard
(Department of Electrical and Computer Engineering, Rice University
(on leave from Princeton University) morchard@ima.umn.edu
On
Modelling Location Uncertainty in Images: A Coding Perspective
Virtually
all image processing algorithms assume an underlying probability
distribution on the space of images which guides it in transforming
an original (e.g. uncoded, noisy, degraded, etc.) image into
the target (e.g. coded, denoised, restored, etc.) image. The
unknown locations of events in the scene are perhaps the most
important form of uncertainty characterized by this underlying
probability distribution. The importance of location uncertainty
can be recognized both in the processes by which natural images
are generated, and in the way humans perceive those images.
This
talk uses the image coding perspective to explore how to accurately
model image probability in light of the importance of location
uncertainty. Through examples and thought experiments we show
that current image coding algorithms lack tools to efficiently
characterize location uncertainty. We show that location uncertainty
requires that the probability of natural images be aligned to
nonlinear manifolds in image space. This talk describes work
in progress, and audience discussion and participation is encouraged.
Amy
R. Reibman (AT&T Bell Labs)
Source
Coding Alternatives for Video Transport Over Networks
We
consider scalable video coding and multiple description video
coding as alternatives for compressing video which is to be
transported over a network. Each type of network and application
has different requirements that are placed on the video system;
hence each of the possible source coding methods is advantageous
in different scenarios. The goal of this talk is threefold:
first, to present a conceptual methodology for joint network
and video optimization; second, to familiarize the audience
with source coding alternatives for video, with a clear understanding
of the advantages and disadvantages associated with each; and
third, to begin to identify scenarios in which one alternative
may be better than the others.
We
begin by identifying a set of common parameters that begin to
characterize how a generic network affects the video system.
We then introduce the different options available for compressing
video for networks, and indicate the trade-offs associated with
each algorithm. Next, using some of the identified common parameters,
we begin to identify scenarios in which each video coding algorithm
may be advantageous.
Multiple
description video coding may be advantageous when there is a
high packet loss rate with little knowledge in the network as
to the instantaneous loss rate, or when the loss pattern is
a close approximation to the ideal multiple description channels.
Scalable video coding has a range of attractive features, including
easy rate adaptation and easy prioritization. However, existing
algorithms may be too inefficient to provide advantages over
one-layer coding in some applications.
Hanan Samet (University of Maryland - College Park)
Spatial Databases and Geographic Information Systems
An introduction is given to the spatial database issues involved
in the design of geographic information systems (GIS) from the
perspective of a computer scientist. Some of the topics to be
discussed include the nature of a GIS and the functionalities
that are desired in such systems. Representation issues will
also be reviewed. The emphasis will be on indexing methods as
well as the integration of spatial and nonspatial data. Demos
will be shown of the SAND Spatial Browser as well as the VASCO
JAVA applet found at http://www.cs.umd.edu/~hjs/quadtree/index.html
which illustrate these ideas.
Shashi
Shekhar (Computer Science Department, University
of Minnesota) http://www.cs.umn.edu/research/shashi-group/
shekhar@cs.umn.edu
An Overview of Spatial Databases for Digital Library
Spatial databases have been an active area of research for over
two decades, addressing the growing data management and analysis
needs of spatial applications such as Geographic Information
Systems. This research has produced a taxonomy of models for
space, spatial data types and operators, spatial query languages
and processing strategies, as well as spatial indexes and clustering
techniques. However, more research is needed to improve support
for network and field data, as well as query processing (e.g.
cost models, bulk load). Another important need is to apply
the spatial data management accomplishments to newer applications
such as data warehouses and multimedia information systems.
The objective of this paper is to identify recent accomplishments
and the research needs in the near term.
Julius
Smith (Stanford University)
Musical
Signal Models for Audio Rendering
This
talk will summarize several lines of research going on in the
field of music/audio signal processing that are applicable to
audio compression and data reduction. While the techniques were
motivated originally by the desire for realistic ``virtual musical
instruments" (including the human voice), the resulting rendering
models may be efficiently transmitted to a receiver as a ``specialized
decoder" in software form which is then ``played" by a very
sparse data stream. In most cases, there is also a straightforward
tradeoff between rendering quality and computational expense
at the receiver. Since all models are built from well behaved
audio signal processing components, the distortion at low complexity
levels tends to be of a high level character, sounding more
like a different instrument or performance than a distorted
waveform.
Nuno
Vasconcelos (Cambridge Research Laboratory Compaq
Computer Corporation) nuno@crl.dec.com
http://www.media.mit.edu/~nuno
A
Decision-Theoretic View of Image Retrieval
The
design of an effective architecture for image retrieval requires
careful consideration of the interplay between feature selection,
feature representation, and similarity function. We introduce
a decision theoretic formulation of the retrieval problem that
establishes guidelines for the joint design of all these components
leading to a Bayesian architecture with minimum probability
of retrieval error. This architecture is shown to generalize
a significant number of previous approaches, solving some of
the most challenging problems faced by these: joint modeling
of color and texture, explicit control of the trade-off between
feature transformation and feature representation, good invariance
properties, and unified support for local and global queries
without image segmentation. Extensive experimental results show
that Bayesian retrieval performs well on color, texture, and
generic image databases in terms of both retrieval accuracy
and perceptual relevance of similarity judgments.
Clement
Yu
(Department of EECS, University of Illinois at Chicago) yu@eecs.uic.edu
Retrieval
of Images of People from the Web
We build an image search engine which retrieves images of a
person, given his/her name. The search engine depends on two
types of evidence: visual and textual. A face detection module
determines if a given image has a human face. A face recognition
module determines a degree of similarity between a Web image
from a database of images of persons. A text analysis module
determines a degree of similarity between the texts surrounding
the image and the name of the person. The outputs of these three
modules are combined in different ways to determine the likelihood
that a Web image contains the image of the person. Experimental
results are provided to demonstrate that our system is superior
to existing commercial systems and research prototypes. A feedback
mechanism is being added to our system.
Material
from Talks
Digital Libraries:
Data Modeling and Representation
2000-2001
Program: Mathematics in Multimedia
|