Talk abstract:
Matching Chemical and Biological Structures Using Graph-Theoretical
Techniques
Peter Willett, University of Sheffield
Subgraph isomorphism algorithms have formed the basis of most
substructure searching procedures since the very first chemical
information systems were developed in the early Sixties. In
such systems, a 2D chemical structure diagram is represented
by a graph in which the nodes and edges of a graph correspond
to the atoms and bonds of a chemical structure. The presence
of a query substructure within a database structure is checked
using a subgraph isomorphism algorithm, after an initial screening
search has been carried out to minimise the number of structures
that need undergo the time- consuming isomorphism match. This
is a very powerful way of accessing a chemical database, but
it has only recently been recognised that the same basic techniques
are equally applicable to other types of structure-matching
procedure.
In this talk, I will commence by giving a brief introduction
to graph-based approaches to 2D substructure searching, and
then go on to describe how analogous techniques can be used
for pharmacophore matching in databases of 3D structures, either
rigid or flexible. I will also describe developments of these
ideas which we have applied to searching in databases of 3D
protein structures, for patterns of helix and/or strand secondary
structure elements and for patterns of amino acid sidechains,
and, most recently, to searching databases of carbohydrate sequences.
In the second part of the talk, I will discuss the use of
maximal common subgraph (MCS) isomorphism algorithms in chemical
information systems. These are used for applications including
the detection of chemical reaction sites and the mapping of
3D pharmacophoric patterns. I will discuss several comparative
studies we have carried out to determine which MCS algorithms
are most appropriate for chemical applications, demonstrating
how run-time efficiency is crucially dependent on the structural
characteristics of the graphs that are to be matched, and conclude
by illustrating the use of an MCS algorithm to identify previously
unknown structural relationships between several pairs of 3D
protein structures.
Back to Workshop Schedule
1996-1997
Mathematics in High Performance Computing
|