HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program

Material from Talks


Jont Allen (AT&T Labs-Research)

The Intensity JND Comes From Poisson Neural Noise: Implications for Image Coding      pdf (236KB)    postscript.gz (147KB)

Shi-Kuo Chang (University of Pittsburgh)  chang@cs.pitt.edu

Sentient Map - A Novel Interface for Digital Libraries   Slides

The sentient map is a new paradigm for visual information retrieval. It enabl es the user to view data as maps, so that gestures, more specifically c-gestures, can be used for the interaction between the user and the multimedia information system. Different c-gestures are then dynamically transformed into spatial/temporal queries, or Sigma-queries, for multimedia information sources and databases. An e-learning environment involving many academic inst itutions serves as a test bed to evaluate this approach. Application to digital library is discussed.

Nicholas Coult (Department of Mathematics, Augsburg College)  coult@augsburg.edu

Compression and Region-of-Interest Extraction for Large Incomplete Data Sets

In this talk I will present an algorithm for the compression of incomplete volumetric data sets. As is often the case with scientific or other real-world data, one must handle three-dimensional volumes of data in which part of the data is missing or invalid; I refer to such data as incomplete. Standard methods for compression of real data would normally perform poorly in this situation. The algorithm I present introduces an extra step of processing to improve the compression performance. This algorithm also has the property that after compression, one may extract a small subset of the data from the compressed file without having to uncompress the whole file simultaneously, an obvious advantage for data sets that are too large too fit in main memory.

Michelle Effros (California Institute of Technology)   effros@mite.z.caltech.edu

Network Source Codes in Digital Libraries

Traditional data compression algorithms or source codes were developed for "point-to-point" communications environments, where a single transmitter sends information through space (data communication) or time (data storage) to a single receiver. While direct application of these algorithms to digital libraries is possible and, in fact, pervasive, this approach ignores the multi-user, multi-application nature of many types of datastores. As a result, applying traditional source coding techniques in multi-user digital libraries can limit system functionality and lead to extreme inefficiencies in bandwidth and storage space. Network source codes generalize traditional data compression methods from the point-to-point communication scenario to more general "multi-point" network environments. This talk includes a brief introduction to the theory and practice of network source coding, focusing on potential applications for network source codes in digital libraries. Special cases of network source codes include multi-resolution, multiple description, multiple access, and broadcast system source codes. The proposed methods may be applied to achieve greater versatility, robustness, and efficiency in digital libraries.

Arif Ghafoor (Purdue University)

Multimedia Database Management Systems: A Perspective

Emerging multimedia information technologies will allow users to store, retrieve, share, and manipulate complex information which are expected be used for building exciting new applications. Development of such applications to a mature and fruitful stage will require substantial changes in our approach to the design of operating systems, databases and storage systems. Most multimedia applications will use some form of pre-orchestrated stored information. Such applications include telemedicine, digital libraries, virtual reality, on-line education and training, CAD/CAM, etc. The stored nature of the information poses a number of challenges and can allow novel techniques in the management and representation of multimedia data and knowledge. The key challenges are related to data organization and integration, indexing and retrieval mechanisms, intelligent searching techniques, information browsing, content-based query processing, handling of heterogeneity and variety of multimedia data, and so forth.

In this talk, we highlight these challenges and provide an assessment of the state-of-the-art in developing large scale multimedia database systems. Several case studies of industrial and research projects are presented to evaluate the efficacy of the theoretical foundation of this area.

Robert Gray (Stanford University)

Gauss Mixture Vector Quatization for Compression, Classification, and Modeling

The "worst case'' attribute of Gaussian vectors for data compression/source coding originally developed by Sakrison and Lapidoth using Shannon rate-distortion theory is developed using the high rate quantization theory of Bennett, Zador, and Gersho and extended to Gauss mixtures, providing an approach to robust data compression for nonGaussian sources such as images. The analysis provides several interesting side results, including a new interpretation of the minimum discrimination information distortion (MDI) measure and its application to clustering models and constructing Gauss mixture models based on training data. High rate quantization theory provides a mathematical connection between the distortion and the performance of a classified vector quantizer for nonGaussian data designed using Gaussian distributions. Although the primary application is compression and classification, several ideas relating maximum entropy estimation of probability densities, the MAXDET problem, and Markov mesh random fields arise in the analysis. The theory provides a hindsight explanation for why CELP speech coders work as well as they do.

Sheila S. Hemami (Visual Communications Lab, Cornell University School of Electrical Engineering)   hemami@CS.Cornell.EDU

Perception of Extremely Low-Rate Images and Video: Psychophysical Evaluations and Analyses

Despite the seemingly constant increases in available bandwidth, images and video encoded at low rates are currently and will continue to be commonly used for visual communication. Time and expense are two motivations for the use of low-rate encoded content, and for some access modes such as 3G wireless systems, only low-rate encoded images and video can be transmitted. To maximize the access to digitally archived information, low-rate-encoded versions of the content should be made available. At such rates, compression artifacts are clearly visible. To maximize perceived quality, characteristics of the human visual system (HVS) should be considered. Perceptually motivated image compression research has focused on producing visually lossless images, and very little work has been done on understanding perception of low-rate video. This talk will describe our recent work on characterizing human perception of low-rate encoded images and video, and will discuss applications to compression.

Alfred Hero (University of Michigan)

Divergence Matching Criteria for Registration, Indexing and Retrieval

We review, motivate and apply the Renyi divergence measure for classification and detection tasks arising in database matching. The Renyi divergence is a generalization of the Kullback-liebler divergence and the Hellinger distance for measuring differences between multivariate probability densities. This divergence measure is motivated directly from Chernoff's theorem on the asymptotic probability of error rate of the optimal discrimanant. To be applied to image registration, indexing, or retrieval the Renyi divergence must be estimated from the data. There are two ways to accomplish this: 1) non-parametric density estimation; 2) minimal graph matching via minimal spanning trees or other quasi-additive optimal graph structure. We will present both theoretical theoretical results and implementations of Renyi matching criteria for a range of problems relevant to searching databases.

James Johnstpon (AT&T Labs)

Perceptual coding - Interactions Between Models and Coders

This talk will provide a tutorial on perceptual audio coders, as well as introduce the basics of some perceptual image coders. In the course of explaining the basics behind the perceptual coder a number of problems arise in fitting together the lossy, non-linear perceptual process with the generally linear, LMS coding process, with its attached information-theoretic and computer-science like entropy coding and solution methods. A number of these interactions will be pointed out, from filterbank fitting vs. perceptual models, to quantization strategy vs. error loudness. Some ad-hoc solutions will be shown, and some open questions will be mentioned. To date, there has been very little work in actually determining an 'optimum' to such problems, rather work on finding something that works, that can approximate the optimum, rather than finding an "optimum". Perhaps this work will be mature when, given a perceptual model and a (set of) filterbanks, quantizers, etc, one can produce evidence of a true optimum.

Don H. Johnson (Computer & Information Technology Institute, Department of Electrical & Computer Information, Rice University, Houston, Texas)   dhj@rice.edu

A Theory of Information Processing

Signal processing concerns designing algorithms for manipulating signals to best achieve some signal-based criterion. Information theory focuses on how best to communicate and represent digital streams. What is missing from both is a consideration of the information represented by signals no matter what their form. In particular, multimedia data streams are quite complex, varied and highly interdependent. By melding aspects of signal processing and information theory, information processing theory quantifies how well signals represent information regardless of their form and how well systems process (extract information components at the expense of others) and extract information.

Sharad Mehrotra (University of California at Irvine)

Clustering and Indexing of Multimedia Objects in the MARS System

The goal of the MARS project is the design and development of next generation information systems that provides seamless access to multimedia information based on its rich internal content. Due to many fundamental limitations of retrieving multimedia information based solely on textual annotations, we have adopted a vision centric approach in which objects are represented and retrieved based on low-level visual features (e.g., color, texture, layout, etc). These visual properties may be extracted automatically from images/video making the approach scalable to large as well as heterogeneous multimedia collections.

Supporting content-based queries over visual feature representations poses many significant challenges to existing practice of database management (DBMS) and information retrieval (IR). Existing IR techniques that deal primarily with textual information need to be generalized to support content-based retrieval over multimedia. Furthermore, since visual feature representations define complex non-euclidean vector spaces, techniques need to be developed to support such complex multidimensional information in DBMSs. Another challenge is to integrate multimedia IR techniques with DBMSs. Problem arises since existing DBMSs do not have any native support for storage and processing of imprecise information while content-based retrieval is inherently imprecise.

In this talk, I will provide an overview of the progress we have made in addressing some of the above challenges in supporting multimedia information in DBMSs. The focus of the talk will be on the problem of indexing and efficient retrieval of multimedia objects (viz., the dimensionality curse problem, support for arbitrary distance metrics, support for novel types of queries including refined queries in databases) and the solutions developed in the context of MARS.

Michael T. Orchard (Department of Electrical and Computer Engineering, Rice University (on leave from Princeton University)  morchard@ima.umn.edu

On Modelling Location Uncertainty in Images: A Coding Perspective

Virtually all image processing algorithms assume an underlying probability distribution on the space of images which guides it in transforming an original (e.g. uncoded, noisy, degraded, etc.) image into the target (e.g. coded, denoised, restored, etc.) image. The unknown locations of events in the scene are perhaps the most important form of uncertainty characterized by this underlying probability distribution. The importance of location uncertainty can be recognized both in the processes by which natural images are generated, and in the way humans perceive those images.

This talk uses the image coding perspective to explore how to accurately model image probability in light of the importance of location uncertainty. Through examples and thought experiments we show that current image coding algorithms lack tools to efficiently characterize location uncertainty. We show that location uncertainty requires that the probability of natural images be aligned to nonlinear manifolds in image space. This talk describes work in progress, and audience discussion and participation is encouraged.

Amy R. Reibman (AT&T Bell Labs)

Source Coding Alternatives for Video Transport Over Networks

We consider scalable video coding and multiple description video coding as alternatives for compressing video which is to be transported over a network. Each type of network and application has different requirements that are placed on the video system; hence each of the possible source coding methods is advantageous in different scenarios. The goal of this talk is threefold: first, to present a conceptual methodology for joint network and video optimization; second, to familiarize the audience with source coding alternatives for video, with a clear understanding of the advantages and disadvantages associated with each; and third, to begin to identify scenarios in which one alternative may be better than the others.

We begin by identifying a set of common parameters that begin to characterize how a generic network affects the video system. We then introduce the different options available for compressing video for networks, and indicate the trade-offs associated with each algorithm. Next, using some of the identified common parameters, we begin to identify scenarios in which each video coding algorithm may be advantageous.

Multiple description video coding may be advantageous when there is a high packet loss rate with little knowledge in the network as to the instantaneous loss rate, or when the loss pattern is a close approximation to the ideal multiple description channels. Scalable video coding has a range of attractive features, including easy rate adaptation and easy prioritization. However, existing algorithms may be too inefficient to provide advantages over one-layer coding in some applications.

Hanan Samet (University of Maryland - College Park)

Spatial Databases and Geographic Information Systems

An introduction is given to the spatial database issues involved in the design of geographic information systems (GIS) from the perspective of a computer scientist. Some of the topics to be discussed include the nature of a GIS and the functionalities that are desired in such systems. Representation issues will also be reviewed. The emphasis will be on indexing methods as well as the integration of spatial and nonspatial data. Demos will be shown of the SAND Spatial Browser as well as the VASCO JAVA applet found at http://www.cs.umd.edu/~hjs/quadtree/index.html which illustrate these ideas.

Shashi Shekhar (Computer Science Department, University of Minnesota)  http://www.cs.umn.edu/research/shashi-group/  shekhar@cs.umn.edu

An Overview of Spatial Databases for Digital Library

Spatial databases have been an active area of research for over two decades, addressing the growing data management and analysis needs of spatial applications such as Geographic Information Systems. This research has produced a taxonomy of models for space, spatial data types and operators, spatial query languages and processing strategies, as well as spatial indexes and clustering techniques. However, more research is needed to improve support for network and field data, as well as query processing (e.g. cost models, bulk load). Another important need is to apply the spatial data management accomplishments to newer applications such as data warehouses and multimedia information systems. The objective of this paper is to identify recent accomplishments and the research needs in the near term.

Julius Smith (Stanford University)

Musical Signal Models for Audio Rendering

This talk will summarize several lines of research going on in the field of music/audio signal processing that are applicable to audio compression and data reduction. While the techniques were motivated originally by the desire for realistic ``virtual musical instruments" (including the human voice), the resulting rendering models may be efficiently transmitted to a receiver as a ``specialized decoder" in software form which is then ``played" by a very sparse data stream. In most cases, there is also a straightforward tradeoff between rendering quality and computational expense at the receiver. Since all models are built from well behaved audio signal processing components, the distortion at low complexity levels tends to be of a high level character, sounding more like a different instrument or performance than a distorted waveform.

Nuno Vasconcelos (Cambridge Research Laboratory Compaq Computer Corporation)  nuno@crl.dec.com   http://www.media.mit.edu/~nuno

A Decision-Theoretic View of Image Retrieval

The design of an effective architecture for image retrieval requires careful consideration of the interplay between feature selection, feature representation, and similarity function. We introduce a decision theoretic formulation of the retrieval problem that establishes guidelines for the joint design of all these components leading to a Bayesian architecture with minimum probability of retrieval error. This architecture is shown to generalize a significant number of previous approaches, solving some of the most challenging problems faced by these: joint modeling of color and texture, explicit control of the trade-off between feature transformation and feature representation, good invariance properties, and unified support for local and global queries without image segmentation. Extensive experimental results show that Bayesian retrieval performs well on color, texture, and generic image databases in terms of both retrieval accuracy and perceptual relevance of similarity judgments.

Clement Yu (Department of EECS, University of Illinois at Chicago)  yu@eecs.uic.edu

Retrieval of Images of People from the Web

We build an image search engine which retrieves images of a person, given his/her name. The search engine depends on two types of evidence: visual and textual. A face detection module determines if a given image has a human face. A face recognition module determines a degree of similarity between a Web image from a database of images of persons. A text analysis module determines a degree of similarity between the texts surrounding the image and the name of the person. The outputs of these three modules are combined in different ways to determine the likelihood that a Web image contains the image of the person. Experimental results are provided to demonstrate that our system is superior to existing commercial systems and research prototypes. A feedback mechanism is being added to our system.

 

Material from Talks

Digital Libraries: Data Modeling and Representation

2000-2001 Program: Mathematics in Multimedia

Go