Talk
Abstracts:
Material
from Talks

P.
Anandan (Vision Technology Group, Microsoft Research) anandan@microsoft.com
Layered
Representations of Visual Scene Information in Video Sequences
Images
and video provide simple, inexpensive, and easily accessible
ways of creating visual records of everything around us. However,
the scene information is implicitly buried inside the raw video
data, and is provided with the cost of very high temporal redundancy.
While the standard sequential form of video storage is adequate
for viewing in a "movie mode", it fails to support
rapid access to information of interest that is required in
many of the emerging multimedia applications.
In
this talk, I will describe a layered approach to decomposing
the input video as a potential intermediate level representation
of visual information. The layers capture regions of the image/video
that show coherence in their spatio-temporal visual structure.
They can either be 2D image layers together with their 2D image
motions or 3D scene layers with associated 3D geometry/shape
and motion. The layers can also effectively separate visual
information that is superimposed due to reflections and transparencies
in the scene.
The
layered representation provides a basis for a compact reorganization
of the video data supports and can enable non-linear browsing
and efficient indexing directly to information of interest.
It also has the potential to serve as the basis for further
semantic level analysis of the scene elements.
Biography:
P. Anandan is a Senior Researcher at Microsoft Research where
he leads the Interactive Visual Media group. He obtained his
PhD from University of Massachusetts, Amherst in 1987. His research
career is marked by an excessive focus on the problem and applications
of visual motion analysis and the analysis of motion parallax.
Like his research peers, he is trying to make most of the moderate
success of this research area in interesting applications involving
visual scene modeling and video based scene understanding. When
he is not working on motion, or trying to survive in the research
game, he ponders about the nature of intelligent behavior and
the philosophy of the mind.

Charles
A. Bouman (Purdue University)
Image
Database Search and Browsings
The
advent of inexpensive digital storage and high speed digital
communications has made large databases of images and video
quite common. For example, it is not unusual for image databases
to contain over 100,000 images, and a single hour of digital
video stored in MPEG 2 format contains over 600 MBytes of data.
Simple linear browsing of such large databases quickly becomes
impossible, so that these databases will demand new tools for
users to organize and manage them.
This
talk focuses on the use of hierarchical data-structures for
the organization, search and browsing of large image databases
by their content. The first part of the talk discusses the problem
of fast image search in a large database. Interestingly, this
problem is quite similar to the problem of vector quantization
in a metric space. We show that the nearest neighbor search
problem may be efficiently solved by using a best-first branch
and bound search strategy. Perhaps more importantly, we also
demonstrate that computation can be dramatically reduce by accepting
some approximation in the search accuracy.
In
the second part of the talk, we present a method for designing
a hierarchical browsing environment which we call a similarity
pyramid. The similarity pyramid groups similar images together
while allowing users to view the database at varying levels
of resolution. We also show how a user can provide feedback
to this browsing environment through the selection of relevant
images. The similarity pyramid can then be dynamically reorganized
to suit the user's task.

Ed
H. Chi (User Interface Research, Xerox Palo Alto
Research Center PARC) chi@acm.org
Scent
of the Web
PARC
has several research teams that studying the connections betweenhow
people use the World Wide Web and the implications of those
patternsof usage. One area of research, called Information Scent
Project, provides a way to understand how differences in user
goals affect the usage of the Web. The underlying idea is that
users navigate a path through the web guided by the strength
of "information scent." Here, "information scent"
refers to the fact that a number of subtle cues provide users
with a "whiff" of what lies at the next link. This
technology can be used to personalize the user experience on
the Web.
This
research has resulted in a new way to model how users will move
through a Web site, based on the users' goals and the content
and structure of the Web site. Also, a web site are able to
infer user information needs based on an individual user's surfing
patterns.
In
addition, advanced visualization tools have been developed to
mine patterns in this complex data. By examining how these cues
relate to predicted and actual behaviors, researchers expect
to develop advanced techniques for testing web site usability.
Biography:
Ed
H. Chi is a research scientist at Xerox Palo Alto Research Center's
User Interface Research Group. He has been working on Information
Visualization and User Interfaces since 1993. His area of research
and expertise is software systems for computer-human interaction
and 2D/3D user interfaces. His most current project is the study
of information ecology ---understanding how users navigate and
understand information environments, including the Web. His
Ph.D. project was "Spreadsheet for Visualization"
--- a data exploratory tool using a 'spreadsheet metaphor' that
allows each cell to hold an entire data set with a full-fledged
visualization. In the past, he has also worked on computational
molecular biology, and recommendation systems. Ed received his
Ph.D. 1996-1999, M.S. 1994-1996, and B. Comp Sci. 1992-1994
in Computer Science from University of Minnesota. He has won
awards for both teaching and research. In his spare time, Ed
is an avid TaeKwonDo martial artist, motorcylcist, photographer,
potter, and a person who writes poems.

Wesley W. Chu (Computer
Science Department, University of California Los Angeles)
wwc@cs.ucla.edu
Medical
Digital Library to Support Scenario Specific Information Retrieval
slides pdf
(11.9MB) html
Current
large-scale information sources are designed to support general
queries and lack the ability to support scenario-specific information
navigation, gathering, and presentation. As a result, users
are often unable to obtain desired specific in formation within
a well-defined subject area. Today's information systems do
not provide efficient content navigation, incremental approximate
matching, or content correlation. W e are developing the following
innovative technologies to remedy these problems: 1) s cenario-
based proxies, enabling the gathering and filtering of information
customized fo r users within a pre-defined domain; 2) context-sensitive
navigation and matching, provi ding approximate matching and
similarity links when an exact match to a user's reques t is
unavailable; 3) content correlation of documents, creating semantic
links betwe en documents and information sources; 4) user models
for customizing retrieved information and result presentation;
and 5) Phrase indexing for document retriev al and summarization.
A digital medical library is currently being constructed using
t hese technologies to provide customized information for the
user. The technologies ar e general in nature and can provide
custom and scenario-specific information in many other domains
(e.g., crisis management).

Neil
Day (Digital Garage Inc. Strategic Research
& Development Department) neil@garage.co.jp Digital
Garage: http://www.garage.co.jp
WebNation: http://www.webnation.co.jp
MPEG-7
Applications: Multimedia Content Retrieval
MPEG-7
is a standard for describing features of multimedia content.
The standard aims to provide the world's most comprehensive
set of audiovisual descriptions and is intended to provide better
solutions for indexing and searching the ever-expanding world
of multimedia content. These descriptions are based on the catalog
(title, creator, rights), semantic (the who, what, when, where
information), and structural (the color histogram - measurement
of the amount of color - associated with an image or the timbre
of a recorded instrument) features of the AV content, and leverages
on the AV data representation defined by MPEG-1, 2, and 4. MPEG-7
also uses XML Schema as the language of choice for content description.
MPEG-7 will be interoperable with other leading standards such
as, SMPTE Metadata Dictionary, Dublin Core, EBU P/Meta, and
TV Anytime. For example, MPEG-7 will allow finding information
by rich-spoken queries, hand-drawn images, and humming. This
presentation will provide an overview of typical MPEG-7 applications
and will represent the latest and state-of-the-art MPEG-7 solutions
for search and retrieval of Multimedia content.

Edward
J. Delp
(Purdue University) ace@ecn.purdue.edu
Image
and Video Databases: Who Cares?
In
this talk we will discuss the research issues in the deployment
and management of large video database. In particular we will
describe a video database system we are developing at Purdue
known as ViBE (Video Browsing Environment). ViBE is a unique
browseable/searchable paradigm for organizing video databases
containing a large number of sequences. The system first segments
video sequences into shots by using the Generalized Trace obtained
from the DC-sequence of the compressed data stream. Each video
shot is then represented by a hierarchical tree structure of
key frames, and the shots are automatically classified into
predetermined pseudo-semantic classes. Finally, the results
are presented to the user in an active browsing environment
using a similarity pyramid data structure. The similarity pyramid
allows the user to view the video database at various levels
of detail. The user can also define semantic classes and reorganize
the browsing environment based on relevance feedback.
We
will further speculate on who will use these types of databases
in the future. Our feeling is that most of the current applications
scenarios are ill-posed at best.

Peter
Enser
(School of Information Management University of Brighton, U.K.)
In
Quest of Visual Imagery
In the commercial use of image collections a heavy dependency
continues to be exhibited on a concept-based image retrieval
paradigm in which the query is verbalised by the client and
resolved as a composite operation involving text matching of
the query with collection metadata, subject conceptualisation
and visualisation, and browsing.
The practical and intellectual challenges posed by this paradigm
are significant and frequently expressed. Nevertheless, studies
of user need clearly indicate that a heavy dependency must continue
to be placed on concept-based rather than content-based image
retrieval techniques within the working practices of still and
moving image collections. In support of this contention, the
paper offers evidence gathered from surveys of user needs which
have been conducted in collaboration with a number of image
archives.
The paper concludes with a consideration of hybrid image retrieval
processes in which the concept-based and content-based image
retrieval paradigms are combined, and appraises their current
and potential contributions to real clients~R needs for visual
image material.

David
Forsyth
(EECS, University of California - Berkeley) dafmmf@gte.net
Clustering,
Object Recognition and Picture Retrieval
It
would be pleasing to have a program that could search for images
using a description of what is in the picture. This is difficult,
because it is not known how to recognize objects. I will give
a broad overview of the current state of knowledge about building
object recognition systems. A combination of template matchers
and spatial reasoning algorithms can be used to find faces,
people, and some animals. Other categories are currently very
difficult; there has been some success at finding isolated objects
of stereotypical appearance.
We
have built a system that can cluster collections of images using
both textual information and image information. The resulting
clusters appear to be semantically coherent --- or, at least,
coherent enough to be useful. I will show how to use this system
to browse collections, search for images, and generate possible
illustrations automatically given a caption. The system can
annotate images with plausible words, too. Our system is opportunistic,
in that it can use the output of various detectors as well.
For example, if one has a face detector, our system can use
the response of that detector in clustering the image. Ideally,
it would use the output of many different object recognition
programs.

Ramesh
Jain
(University of California-San Diego)
Navigational
Search
Almost
every week I hear about a new search engine. At one time, I
used to get excited that finally there might be a solution to
informationitis. Informationitis is the disease resulting due
to progress made by information technology. Information on the
Web is exponentially increasing, thus we think that we have
all information available to us when we need it. Unfortunately,
the technology to find what you need is improving sub-linearly
and thus the gap between what information you think you can
get, and really what you can get is increasing almost exponentially.
What is better not to have information or to have information
but not have access to it?
In
good old days, we had computers to compute with data. Many techniques
were developed to structure data. Databases were designed to
provide flexible access to large volumes of data. In databases,
techniques were developed to structure the data and then organize
it to find meaningful information. The progress in technology
resulted in too much data on one hand and inclusion of text,
graphics, video, audio, and other sensory data also in the scope
of computing. Many techniques started getting developed to manage
unstructured data by organizing it. To add to the explosion,
came the World Wide Web. Library science approach to organizing
all the documents on the Web resulted in the ?virtual libraries?
commonly known as portals. The web grew organically and the
growth was exponential. That is its beauty and that is its strength.
But, how does one find meaningful information in an unstructured
organic multimedia domain as large as the Web? Powerful processing
or simple reorganization or classification of the data will
not solve the problem. We will have to take a fresh look at
the problem.
The
problem with this approach is that the context for the search
is removed by abstracting the search level to keywords. The
context for the results is removed by presenting the results
of a query in a listing. We have neither a Gestalt view of query
space nor that of the results.
I
believe that by developing what-you-see-is-what-you-get (WYSIWYG)
like techniques the search approaches can be converted to navigation
like approaches. As we all know, word processing became usable
by masses because of WYSIWYG approaches. Search can also be
converted by making the query and presentation environment the
same. Thus, the system displays the results and then looking
at the results one can refine the query and, if necessary, generate
a new query. My discussion will focus on this paradigm of navigational
search.

Lucy
Nowell (Battelle/ Pacific Northwest National Laboratory)
Lucy.Nowell@pnl.gov
Visualization
for Digital Libraries: Usability Is Not Utility
The power of information visualization lies in its ability to
convey information at the high bandwidth of the human perceptual
system, facilitating recognition of patterns in an information
space and supporting navigation in large collections. Available
visualization systems offer a variety of approaches to presenting
information about collections of text, from conceptual maps
(i.e., Lin's work with Kohonen maps, Battelle's SPIRE ThemeView
visualization, etc.) to tools that base their layout on metadata
(i.e., Envision and FilmFinder) or similarity to query terms
(i.e., VIBE). Other systems show query term occurrence within
individual documents (i.e., TileBars), the conceptual structure
of individual documents (i.e., Topic Islands), thematic trends
over time within a collection (i.e. ThemeRiver), and so forth.
The usability of these systems for typical user populations
has not been well established, and we know of no studies that
confirm their utility.
Furthermore, digital libraries offer far more than text. In
the realm of the World Wide Web, a single "document" may include
video and audio, animations, and links to executable programs,
in addition to the images, maps, tables, and other non-text
features that occur in more-traditional documents. There are
existing collections that focus primarily on music, movies,
images of various kinds, dance notation and records of performance,
scientific data, architectural drawings, etc.
This talk will focus on issues in assessing the utility of visual
analysis tools, particularly in the context of extending these
tools to heterogeneous collections that span the full range
of digital media.

Francois
Pachet (SONY
CSL-Paris) pachet@csl.sony.fr
Music
Content Management for Electronic Music Distribution What is
"Interesting Music"?
The number of musical titles, considering only Western music,
amounts to several millions. Besides issues related to copy
protection and copyright management, the possibility of transporting
these millions of music titles easily and efficiently raises
the issue of content management: how to design efficient means
of accessing, retrieving and exploring music titles? This talk
will describe research conducted at Sony CSL in three related
areas: 1) feature extraction from the music signal, 2) musical
data mining from large information sources, and 3) exploitation
of similarity functions and music descriptors for music search
and retrieval. I will illustrate the talk by research conducted
and systems developed in our team, and in particular rhythm
extraction from the signal, distance function and radio program
analysis, and sequence generation from music descriptors. In
all these cases I will raise the question: what is "interesting
music?" and propose various possible answers.

Peter
Pirolli (Xerox PARC) pirolli@parc.xerox.com
Information
Foraging and Information Scent: Theory, Models, and Applications
Information
foraging theory concerns human strategies and technologies for
information seeking, gathering, and consumption. It is founded
on the assumption that such strategies and technologies adapt
to the flux of information in the environment. The development
of the theory has, to a large extent, been inspired by optimal
foraging theory in biology and anthropology, which analyzes
the adaptive value of food-foraging strategies, as well as current
theories of cognition and perception. At the level of individuals,
we have been studying users working with novel information retrieval
technologies, new user interface techniques, and, of course,
the World Wide Web (WWW). One of the key concepts to emerge
from this work is the notion of 'information scent', which refers
to the cues by which users assess their environment and make
navigational choices. In earlier work we developed computational
models of foraging and information scent that provided detailed
predictions of user behavior with novel technologies. We expect
that these models will lead to improvements in usability and
new user interface techniques, and we have been developing new
technologies along these lines, ranging from new ways of presenting
WWW pages, to methods of predicting traffic flow at a WWW site,
to data mining techniques for inferring the information goals
of users based on WWW server logs.

Shashi
Shekhar (Computer Science Faculty, CTS and ITS Inst.,
Univ. of Minnesota) shekhar@cs.umn.edu
Clustering
oriented Storage and Access Methods
Current Spatial Database Management Systems (SDBMS) provide
efficient access methods and operators for point and range queries
over collections of spatial points, line segments, and polygons.
However, it is not clear if existing spatial access methods
can either efficiently support graph computations based on connectivity
or scale upto to data embedded in high dimension space. The
expected I/O cost for many graph operations can be reduced by
maximizing the Weighted Connectivity Residue Ratio (WCRR), i.e.,
the chance that a pair of connected nodes that are more likely
to be accessed together are allocated to a common page of the
file. CCAM is an access method for general graphs that uses
connectivity clustering. CCAM supports the operations of insert(),
delete(), create(), and find() as well as the new operations,
get-A-successor() and get-successors(), which retrieve one or
all successors of a node to facilitate aggregate computations
on graphs. The nodes of the graph are assigned to disk pages
via a graph partitioning approach to maximize the WCRR. CCAM
includes methods for static clustering, as well as dynamic incremental
reclustering, to maintain high WCRR in the face of updates,
without incurring high overheads. We also describe possible
modifications to improve the WCRR that can be achieved by existing
spatial access methods. Experiments with graph computations
on show that CCAM outperforms existing access methods, even
though the proposed modifications also substantially improve
the performance of existing spatial access methods.

Michael
Smith
(AVA Media Inc.) msmithava@yahoo.com
Video
Characterization for Browsing and Summarization
As
video becomes more prevalent in digital media, the potential
for visualization technology exceeds traditional analog methods
such as Fast-Forward playback. Advances in characterization
and meta-data acquisition have lead to content extraction systems
for video browsing and summarization. Content information is
used to analyze and shorten image and video viewing time without
apparent loss in content. This presentation will describe visualization
technology for browsing and summarization, characterization
and meta-data acquisition, and user-studies to validate specific
methodology. This includes a description of traditional static
presentations, such as text abstracts and thumbnails, motion
presentations, such as slide shows and skims, and current research
in application specific browsing paradigms. Methodology for
content extraction will be discussed, as well as compressed
domain features for faster analysis of meta-data. Evaluations
from ution ser-studies will be presented along with feedback
from commercial and research applications.

Clement
Yu (University of Illinois at Chicago)
An
Efficient and Effective Metasearch Engine
It
is now a common practice for ordinary people to utilize search
engines to retrieve information from the Web. Although quite
a few existing engines are very powerful, none of them is capable
of covering most materials from the Web. Thus, it may be desirable
to have a metasearch engine which invokes multiple underlying
search engines for a given query so that a higher coverage of
the Web is achieved. In order that the metasearch engine is
efficient, a small number of search engines should be invoked
for a given query and only promising documents are retrieved
from each invoked search engine. An effective metasearch engine
needs to take into consideration not only similarities of documents
but also usefulness of documents which may be exhibited by linkages
among documents. In this talk, we present techniques to construct
an efficient and effective metasearch engine.

Xiang
Sean Zhou (Beckman Institute of Advanced Sciences
& Technologies, University of Illinois at Urbana Champaign)
xzhou2@ifp.uiuc.edu
Relevance
Feedback Techniques in Content-based Image Retrieval
Joint work with Thomas S. Huang
huang@ifp.uiuc.edu
Various relevance feedback techniques have been applied in content-based
image retrieval. In this paper we first try to address the uniqueness
and challenges of this problem. A brief review of the current
relevance feedback techniques is presented. We also propose
biased discriminant analysis as a new relevance feedback scheme.
By varying parameters, BDA provides a trade-off between discriminant
transform and regression. Toy problems are designed to show
the theoretical advantages of the proposed scheme over traditional
discriminant analysis. It is implemented in real-time image
retrieval for large databases and experimental results are presented
to show the improvement achieved by the new scheme.
Material
from Talks
Digital
Libraries - Classification, Retrieval and Visualization
2000-2001
Program: Mathematics in Multimedia
|