An Annotation Capture/Query System For Genomic Data

Monday, April 26, 1999 - 4:00pm - 4:45pm
Keller 3-180
Annotations, comments and remarks are fundamental components of most genomic, protein sequence and 3D protein structure databases. However, this kind of data is usually stored in an unstructured form (e.g., flat text fields or clobs) and, so, is difficult to search or sort. It is therefore not as useful as it could be for retrieving or correlating other content in the database. Furthermore, in most databases, the annotations are meant to be supplied only by those who submit the sequence or structure data that is the subject of the annotation. It is difficult for a general user of the data to register an opinion on or interpretation of the data, or to describe an important correlation between data items.

Another important aspect of genomic data is that its use and value will increase dramatically with the development of new databases of derived data such as ones that will describe patterns, protein functions, conditions for expression, and metabolic pathways. For maximum benefit these new databases will have to integrated with existing databases that are used in drug development and testing. Thus, the database landscape is changing rapidly.

I represent a group at IBM's Almaden Research Center who studies the storage, retrieval and use of scientific and technical data. We have developed a novel approach to annotating data as part of a study done with a major petroleum company. The approach and prototype software is meant to address the use of oil production data by petroleum engineers, but we feel it could have applicability to the use of genomic data, for example, within a pharmaceutical company.

Our system provides a way to bring together data from multiple data sources, to create views on this data, and to capture and retrieve structured annotations about data in these views. Any authorized user can input and use the annotations, and data can be sorted and retrieved using annotation content. We feel that not only people, but software such as data mining applications could also generate annotations and that these annotations could be used as filters for retrieving data for yet other software applications.