HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program

Material from Talks

P. Anandan (Vision Technology Group, Microsoft Research)  anandan@microsoft.com

Layered Representations of Visual Scene Information in Video Sequences

Images and video provide simple, inexpensive, and easily accessible ways of creating visual records of everything around us. However, the scene information is implicitly buried inside the raw video data, and is provided with the cost of very high temporal redundancy. While the standard sequential form of video storage is adequate for viewing in a "movie mode", it fails to support rapid access to information of interest that is required in many of the emerging multimedia applications.

In this talk, I will describe a layered approach to decomposing the input video as a potential intermediate level representation of visual information. The layers capture regions of the image/video that show coherence in their spatio-temporal visual structure. They can either be 2D image layers together with their 2D image motions or 3D scene layers with associated 3D geometry/shape and motion. The layers can also effectively separate visual information that is superimposed due to reflections and transparencies in the scene.

The layered representation provides a basis for a compact reorganization of the video data supports and can enable non-linear browsing and efficient indexing directly to information of interest. It also has the potential to serve as the basis for further semantic level analysis of the scene elements.

Biography:

P. Anandan is a Senior Researcher at Microsoft Research where he leads the Interactive Visual Media group. He obtained his PhD from University of Massachusetts, Amherst in 1987. His research career is marked by an excessive focus on the problem and applications of visual motion analysis and the analysis of motion parallax. Like his research peers, he is trying to make most of the moderate success of this research area in interesting applications involving visual scene modeling and video based scene understanding. When he is not working on motion, or trying to survive in the research game, he ponders about the nature of intelligent behavior and the philosophy of the mind.

Charles A. Bouman (Purdue University)  

Image Database Search and Browsings

The advent of inexpensive digital storage and high speed digital communications has made large databases of images and video quite common. For example, it is not unusual for image databases to contain over 100,000 images, and a single hour of digital video stored in MPEG 2 format contains over 600 MBytes of data. Simple linear browsing of such large databases quickly becomes impossible, so that these databases will demand new tools for users to organize and manage them.

This talk focuses on the use of hierarchical data-structures for the organization, search and browsing of large image databases by their content. The first part of the talk discusses the problem of fast image search in a large database. Interestingly, this problem is quite similar to the problem of vector quantization in a metric space. We show that the nearest neighbor search problem may be efficiently solved by using a best-first branch and bound search strategy. Perhaps more importantly, we also demonstrate that computation can be dramatically reduce by accepting some approximation in the search accuracy.

In the second part of the talk, we present a method for designing a hierarchical browsing environment which we call a similarity pyramid. The similarity pyramid groups similar images together while allowing users to view the database at varying levels of resolution. We also show how a user can provide feedback to this browsing environment through the selection of relevant images. The similarity pyramid can then be dynamically reorganized to suit the user's task.

Ed H. Chi (User Interface Research, Xerox Palo Alto Research Center PARC)    chi@acm.org

Scent of the Web

PARC has several research teams that studying the connections betweenhow people use the World Wide Web and the implications of those patternsof usage. One area of research, called Information Scent Project, provides a way to understand how differences in user goals affect the usage of the Web. The underlying idea is that users navigate a path through the web guided by the strength of "information scent." Here, "information scent" refers to the fact that a number of subtle cues provide users with a "whiff" of what lies at the next link. This technology can be used to personalize the user experience on the Web.

This research has resulted in a new way to model how users will move through a Web site, based on the users' goals and the content and structure of the Web site. Also, a web site are able to infer user information needs based on an individual user's surfing patterns.

In addition, advanced visualization tools have been developed to mine patterns in this complex data. By examining how these cues relate to predicted and actual behaviors, researchers expect to develop advanced techniques for testing web site usability.

Biography:

Ed H. Chi is a research scientist at Xerox Palo Alto Research Center's User Interface Research Group. He has been working on Information Visualization and User Interfaces since 1993. His area of research and expertise is software systems for computer-human interaction and 2D/3D user interfaces. His most current project is the study of information ecology ---understanding how users navigate and understand information environments, including the Web. His Ph.D. project was "Spreadsheet for Visualization" --- a data exploratory tool using a 'spreadsheet metaphor' that allows each cell to hold an entire data set with a full-fledged visualization. In the past, he has also worked on computational molecular biology, and recommendation systems. Ed received his Ph.D. 1996-1999, M.S. 1994-1996, and B. Comp Sci. 1992-1994 in Computer Science from University of Minnesota. He has won awards for both teaching and research. In his spare time, Ed is an avid TaeKwonDo martial artist, motorcylcist, photographer, potter, and a person who writes poems.

Wesley W. Chu (Computer Science Department, University of California Los Angeles)  wwc@cs.ucla.edu

Medical Digital Library to Support Scenario Specific Information Retrieval   slides  pdf (11.9MB)   html

Current large-scale information sources are designed to support general queries and lack the ability to support scenario-specific information navigation, gathering, and presentation. As a result, users are often unable to obtain desired specific in formation within a well-defined subject area. Today's information systems do not provide efficient content navigation, incremental approximate matching, or content correlation. W e are developing the following innovative technologies to remedy these problems: 1) s cenario- based proxies, enabling the gathering and filtering of information customized fo r users within a pre-defined domain; 2) context-sensitive navigation and matching, provi ding approximate matching and similarity links when an exact match to a user's reques t is unavailable; 3) content correlation of documents, creating semantic links betwe en documents and information sources; 4) user models for customizing retrieved information and result presentation; and 5) Phrase indexing for document retriev al and summarization. A digital medical library is currently being constructed using t hese technologies to provide customized information for the user. The technologies ar e general in nature and can provide custom and scenario-specific information in many other domains (e.g., crisis management).

Neil Day (Digital Garage Inc. Strategic Research & Development Department)  neil@garage.co.jp  Digital Garage: http://www.garage.co.jp   WebNation: http://www.webnation.co.jp

MPEG-7 Applications: Multimedia Content Retrieval

MPEG-7 is a standard for describing features of multimedia content. The standard aims to provide the world's most comprehensive set of audiovisual descriptions and is intended to provide better solutions for indexing and searching the ever-expanding world of multimedia content. These descriptions are based on the catalog (title, creator, rights), semantic (the who, what, when, where information), and structural (the color histogram - measurement of the amount of color - associated with an image or the timbre of a recorded instrument) features of the AV content, and leverages on the AV data representation defined by MPEG-1, 2, and 4. MPEG-7 also uses XML Schema as the language of choice for content description. MPEG-7 will be interoperable with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core, EBU P/Meta, and TV Anytime. For example, MPEG-7 will allow finding information by rich-spoken queries, hand-drawn images, and humming. This presentation will provide an overview of typical MPEG-7 applications and will represent the latest and state-of-the-art MPEG-7 solutions for search and retrieval of Multimedia content.

Edward J. Delp (Purdue University)   ace@ecn.purdue.edu

Image and Video Databases: Who Cares?

In this talk we will discuss the research issues in the deployment and management of large video database. In particular we will describe a video database system we are developing at Purdue known as ViBE (Video Browsing Environment). ViBE is a unique browseable/searchable paradigm for organizing video databases containing a large number of sequences. The system first segments video sequences into shots by using the Generalized Trace obtained from the DC-sequence of the compressed data stream. Each video shot is then represented by a hierarchical tree structure of key frames, and the shots are automatically classified into predetermined pseudo-semantic classes. Finally, the results are presented to the user in an active browsing environment using a similarity pyramid data structure. The similarity pyramid allows the user to view the video database at various levels of detail. The user can also define semantic classes and reorganize the browsing environment based on relevance feedback.

We will further speculate on who will use these types of databases in the future. Our feeling is that most of the current applications scenarios are ill-posed at best.

Peter Enser (School of Information Management University of Brighton, U.K.)

In Quest of Visual Imagery

In the commercial use of image collections a heavy dependency continues to be exhibited on a concept-based image retrieval paradigm in which the query is verbalised by the client and resolved as a composite operation involving text matching of the query with collection metadata, subject conceptualisation and visualisation, and browsing.

The practical and intellectual challenges posed by this paradigm are significant and frequently expressed. Nevertheless, studies of user need clearly indicate that a heavy dependency must continue to be placed on concept-based rather than content-based image retrieval techniques within the working practices of still and moving image collections. In support of this contention, the paper offers evidence gathered from surveys of user needs which have been conducted in collaboration with a number of image archives.

The paper concludes with a consideration of hybrid image retrieval processes in which the concept-based and content-based image retrieval paradigms are combined, and appraises their current and potential contributions to real clients~R needs for visual image material.

David Forsyth (EECS, University of California - Berkeley) dafmmf@gte.net

Clustering, Object Recognition and Picture Retrieval

It would be pleasing to have a program that could search for images using a description of what is in the picture. This is difficult, because it is not known how to recognize objects. I will give a broad overview of the current state of knowledge about building object recognition systems. A combination of template matchers and spatial reasoning algorithms can be used to find faces, people, and some animals. Other categories are currently very difficult; there has been some success at finding isolated objects of stereotypical appearance.

We have built a system that can cluster collections of images using both textual information and image information. The resulting clusters appear to be semantically coherent --- or, at least, coherent enough to be useful. I will show how to use this system to browse collections, search for images, and generate possible illustrations automatically given a caption. The system can annotate images with plausible words, too. Our system is opportunistic, in that it can use the output of various detectors as well. For example, if one has a face detector, our system can use the response of that detector in clustering the image. Ideally, it would use the output of many different object recognition programs.

Ramesh Jain (University of California-San Diego)

Navigational Search

Almost every week I hear about a new search engine. At one time, I used to get excited that finally there might be a solution to informationitis. Informationitis is the disease resulting due to progress made by information technology. Information on the Web is exponentially increasing, thus we think that we have all information available to us when we need it. Unfortunately, the technology to find what you need is improving sub-linearly and thus the gap between what information you think you can get, and really what you can get is increasing almost exponentially. What is better not to have information or to have information but not have access to it?

In good old days, we had computers to compute with data. Many techniques were developed to structure data. Databases were designed to provide flexible access to large volumes of data. In databases, techniques were developed to structure the data and then organize it to find meaningful information. The progress in technology resulted in too much data on one hand and inclusion of text, graphics, video, audio, and other sensory data also in the scope of computing. Many techniques started getting developed to manage unstructured data by organizing it. To add to the explosion, came the World Wide Web. Library science approach to organizing all the documents on the Web resulted in the ?virtual libraries? commonly known as portals. The web grew organically and the growth was exponential. That is its beauty and that is its strength. But, how does one find meaningful information in an unstructured organic multimedia domain as large as the Web? Powerful processing or simple reorganization or classification of the data will not solve the problem. We will have to take a fresh look at the problem.

The problem with this approach is that the context for the search is removed by abstracting the search level to keywords. The context for the results is removed by presenting the results of a query in a listing. We have neither a Gestalt view of query space nor that of the results.

I believe that by developing what-you-see-is-what-you-get (WYSIWYG) like techniques the search approaches can be converted to navigation like approaches. As we all know, word processing became usable by masses because of WYSIWYG approaches. Search can also be converted by making the query and presentation environment the same. Thus, the system displays the results and then looking at the results one can refine the query and, if necessary, generate a new query. My discussion will focus on this paradigm of navigational search.

Lucy Nowell (Battelle/ Pacific Northwest National Laboratory)  Lucy.Nowell@pnl.gov

Visualization for Digital Libraries: Usability Is Not Utility

The power of information visualization lies in its ability to convey information at the high bandwidth of the human perceptual system, facilitating recognition of patterns in an information space and supporting navigation in large collections. Available visualization systems offer a variety of approaches to presenting information about collections of text, from conceptual maps (i.e., Lin's work with Kohonen maps, Battelle's SPIRE ThemeView visualization, etc.) to tools that base their layout on metadata (i.e., Envision and FilmFinder) or similarity to query terms (i.e., VIBE). Other systems show query term occurrence within individual documents (i.e., TileBars), the conceptual structure of individual documents (i.e., Topic Islands), thematic trends over time within a collection (i.e. ThemeRiver), and so forth. The usability of these systems for typical user populations has not been well established, and we know of no studies that confirm their utility.

Furthermore, digital libraries offer far more than text. In the realm of the World Wide Web, a single "document" may include video and audio, animations, and links to executable programs, in addition to the images, maps, tables, and other non-text features that occur in more-traditional documents. There are existing collections that focus primarily on music, movies, images of various kinds, dance notation and records of performance, scientific data, architectural drawings, etc.

This talk will focus on issues in assessing the utility of visual analysis tools, particularly in the context of extending these tools to heterogeneous collections that span the full range of digital media.

Francois Pachet (SONY CSL-Paris)  pachet@csl.sony.fr

Music Content Management for Electronic Music Distribution What is "Interesting Music"?

The number of musical titles, considering only Western music, amounts to several millions. Besides issues related to copy protection and copyright management, the possibility of transporting these millions of music titles easily and efficiently raises the issue of content management: how to design efficient means of accessing, retrieving and exploring music titles? This talk will describe research conducted at Sony CSL in three related areas: 1) feature extraction from the music signal, 2) musical data mining from large information sources, and 3) exploitation of similarity functions and music descriptors for music search and retrieval. I will illustrate the talk by research conducted and systems developed in our team, and in particular rhythm extraction from the signal, distance function and radio program analysis, and sequence generation from music descriptors. In all these cases I will raise the question: what is "interesting music?" and propose various possible answers.

Peter Pirolli (Xerox PARC)  pirolli@parc.xerox.com

Information Foraging and Information Scent: Theory, Models, and Applications

Information foraging theory concerns human strategies and technologies for information seeking, gathering, and consumption. It is founded on the assumption that such strategies and technologies adapt to the flux of information in the environment. The development of the theory has, to a large extent, been inspired by optimal foraging theory in biology and anthropology, which analyzes the adaptive value of food-foraging strategies, as well as current theories of cognition and perception. At the level of individuals, we have been studying users working with novel information retrieval technologies, new user interface techniques, and, of course, the World Wide Web (WWW). One of the key concepts to emerge from this work is the notion of 'information scent', which refers to the cues by which users assess their environment and make navigational choices. In earlier work we developed computational models of foraging and information scent that provided detailed predictions of user behavior with novel technologies. We expect that these models will lead to improvements in usability and new user interface techniques, and we have been developing new technologies along these lines, ranging from new ways of presenting WWW pages, to methods of predicting traffic flow at a WWW site, to data mining techniques for inferring the information goals of users based on WWW server logs.

Shashi Shekhar (Computer Science Faculty, CTS and ITS Inst., Univ. of Minnesota)  shekhar@cs.umn.edu

Clustering oriented Storage and Access Methods

Current Spatial Database Management Systems (SDBMS) provide efficient access methods and operators for point and range queries over collections of spatial points, line segments, and polygons. However, it is not clear if existing spatial access methods can either efficiently support graph computations based on connectivity or scale upto to data embedded in high dimension space. The expected I/O cost for many graph operations can be reduced by maximizing the Weighted Connectivity Residue Ratio (WCRR), i.e., the chance that a pair of connected nodes that are more likely to be accessed together are allocated to a common page of the file. CCAM is an access method for general graphs that uses connectivity clustering. CCAM supports the operations of insert(), delete(), create(), and find() as well as the new operations, get-A-successor() and get-successors(), which retrieve one or all successors of a node to facilitate aggregate computations on graphs. The nodes of the graph are assigned to disk pages via a graph partitioning approach to maximize the WCRR. CCAM includes methods for static clustering, as well as dynamic incremental reclustering, to maintain high WCRR in the face of updates, without incurring high overheads. We also describe possible modifications to improve the WCRR that can be achieved by existing spatial access methods. Experiments with graph computations on show that CCAM outperforms existing access methods, even though the proposed modifications also substantially improve the performance of existing spatial access methods.

Michael Smith (AVA Media Inc.)   msmithava@yahoo.com

Video Characterization for Browsing and Summarization

As video becomes more prevalent in digital media, the potential for visualization technology exceeds traditional analog methods such as Fast-Forward playback. Advances in characterization and meta-data acquisition have lead to content extraction systems for video browsing and summarization. Content information is used to analyze and shorten image and video viewing time without apparent loss in content. This presentation will describe visualization technology for browsing and summarization, characterization and meta-data acquisition, and user-studies to validate specific methodology. This includes a description of traditional static presentations, such as text abstracts and thumbnails, motion presentations, such as slide shows and skims, and current research in application specific browsing paradigms. Methodology for content extraction will be discussed, as well as compressed domain features for faster analysis of meta-data. Evaluations from ution ser-studies will be presented along with feedback from commercial and research applications.

Clement Yu (University of Illinois at Chicago)

An Efficient and Effective Metasearch Engine

It is now a common practice for ordinary people to utilize search engines to retrieve information from the Web. Although quite a few existing engines are very powerful, none of them is capable of covering most materials from the Web. Thus, it may be desirable to have a metasearch engine which invokes multiple underlying search engines for a given query so that a higher coverage of the Web is achieved. In order that the metasearch engine is efficient, a small number of search engines should be invoked for a given query and only promising documents are retrieved from each invoked search engine. An effective metasearch engine needs to take into consideration not only similarities of documents but also usefulness of documents which may be exhibited by linkages among documents. In this talk, we present techniques to construct an efficient and effective metasearch engine.

Xiang Sean Zhou (Beckman Institute of Advanced Sciences & Technologies, University of Illinois at Urbana Champaign) xzhou2@ifp.uiuc.edu

Relevance Feedback Techniques in Content-based Image Retrieval

Joint work with Thomas S. Huang huang@ifp.uiuc.edu

Various relevance feedback techniques have been applied in content-based image retrieval. In this paper we first try to address the uniqueness and challenges of this problem. A brief review of the current relevance feedback techniques is presented. We also propose biased discriminant analysis as a new relevance feedback scheme. By varying parameters, BDA provides a trade-off between discriminant transform and regression. Toy problems are designed to show the theoretical advantages of the proposed scheme over traditional discriminant analysis. It is implemented in real-time image retrieval for large databases and experimental results are presented to show the improvement achieved by the new scheme.

Material from Talks

Digital Libraries - Classification, Retrieval and Visualization

Go