Matching Chemical and Biological Structures Using Graph-Theoretical Techniques

Monday, April 7, 1997 - 10:30am - 11:30am
Keller 3-180
Peter Willett (University of Sheffield)
Subgraph isomorphism algorithms have formed the basis of most substructure searching procedures since the very first chemical information systems were developed in the early Sixties. In such systems, a 2D chemical structure diagram is represented by a graph in which the nodes and edges of a graph correspond to the atoms and bonds of a chemical structure. The presence of a query substructure within a database structure is checked using a subgraph isomorphism algorithm, after an initial screening search has been carried out to minimise the number of structures that need undergo the time- consuming isomorphism match. This is a very powerful way of accessing a chemical database, but it has only recently been recognised that the same basic techniques are equally applicable to other types of structure-matching procedure.

In this talk, I will commence by giving a brief introduction to graph-based approaches to 2D substructure searching, and then go on to describe how analogous techniques can be used for pharmacophore matching in databases of 3D structures, either rigid or flexible. I will also describe developments of these ideas which we have applied to searching in databases of 3D protein structures, for patterns of helix and/or strand secondary structure elements and for patterns of amino acid sidechains, and, most recently, to searching databases of carbohydrate sequences.

In the second part of the talk, I will discuss the use of maximal common subgraph (MCS) isomorphism algorithms in chemical information systems. These are used for applications including the detection of chemical reaction sites and the mapping of 3D pharmacophoric patterns. I will discuss several comparative studies we have carried out to determine which MCS algorithms are most appropriate for chemical applications, demonstrating how run-time efficiency is crucially dependent on the structural characteristics of the graphs that are to be matched, and conclude by illustrating the use of an MCS algorithm to identify previously unknown structural relationships between several pairs of 3D protein structures.