Talk abstract:
An Annotation Capture/Query System for
Genomic Data
William Swope
IBM Research Division
Almaden Research Center
San Jose, CA
swope@almaden.ibm.com
Annotations, comments and remarks are fundamental components
of most genomic, protein sequence and 3D protein structure databases.
However, this kind of data is usually stored in an unstructured
form (e.g., flat text fields or clobs) and, so, is difficult
to search or sort. It is therefore not as useful as it could
be for retrieving or correlating other content in the database.
Furthermore, in most databases, the annotations are meant to
be supplied only by those who submit the sequence or structure
data that is the subject of the annotation. It is difficult
for a general user of the data to register an opinion on or
interpretation of the data, or to describe an important correlation
between data items.
Another important aspect of genomic data is that its use and
value will increase dramatically with the development of new
databases of derived data such as ones that will describe patterns,
protein functions, conditions for expression, and metabolic
pathways. For maximum benefit these new databases will have
to integrated with existing databases that are used in drug
development and testing. Thus, the database landscape is changing
rapidly.
I represent a group at IBM's Almaden Research Center who studies
the storage, retrieval and use of scientific and technical data.
We have developed a novel approach to annotating data as part
of a study done with a major petroleum company. The approach
and prototype software is meant to address the use of oil production
data by petroleum engineers, but we feel it could have applicability
to the use of genomic data, for example, within a pharmaceutical
company.
Our system provides a way to bring together data from multiple
data sources, to create views on this data, and to capture and
retrieve structured annotations about data in these views. Any
authorized user can input and use the annotations, and data
can be sorted and retrieved using annotation content. We feel
that not only people, but software such as data mining applications
could also generate annotations and that these annotations could
be used as filters for retrieving data for yet other software
applications.
Back to IMA "HOT TOPICS" Workshop: Challenges and Opportunities
in Genomics: Production, Storage, Mining and Use
"Hot
Topics" Workshops
1998-1999
Mathematics in Biology