# The Network of Sequence Flow Between Protein Structures

Tuesday, January 15, 2008 - 4:15pm - 4:45pm

EE/CS 3-180

Ron Elber (The University of Texas at Austin)

Sequence-structure relationships in proteins are highly asymmetric since

many sequences fold into relatively few structures. What is the number of

sequences that fold into a particular protein structure? Is it possible to

switch between stable protein folds by point mutations? To address these

questions we compute a directed graph of sequences and structures of

proteins, which is based on experimentally determined protein shapes. Two

thousand and sixty experimental structures from the Protein Data Bank were

considered, providing a good coverage of fold families. The graph is

computed using an energy function that measures stability of a sequence in a

fold. A node in the graph is an experimental structure (and the

computationally matching sequences). A directed and weighted edge between

nodes A and B is the number of sequences of A that switch to B because the

energy of B is lower. The directed graph is highly connected at native

energies with ³sinks² that attract many sequences from other folds. The

sinks are rich in beta sheets. The in-degrees of a particular protein shape

correlates with the number of sequences that matches this shape in

empirically determined genomes. Properties of strongly connected components

of the graph are correlated with protein length and secondary structure.

Joint work with Leonid Meyerguz and Jon Kleinberg

many sequences fold into relatively few structures. What is the number of

sequences that fold into a particular protein structure? Is it possible to

switch between stable protein folds by point mutations? To address these

questions we compute a directed graph of sequences and structures of

proteins, which is based on experimentally determined protein shapes. Two

thousand and sixty experimental structures from the Protein Data Bank were

considered, providing a good coverage of fold families. The graph is

computed using an energy function that measures stability of a sequence in a

fold. A node in the graph is an experimental structure (and the

computationally matching sequences). A directed and weighted edge between

nodes A and B is the number of sequences of A that switch to B because the

energy of B is lower. The directed graph is highly connected at native

energies with ³sinks² that attract many sequences from other folds. The

sinks are rich in beta sheets. The in-degrees of a particular protein shape

correlates with the number of sequences that matches this shape in

empirically determined genomes. Properties of strongly connected components

of the graph are correlated with protein length and secondary structure.

Joint work with Leonid Meyerguz and Jon Kleinberg

MSC Code:

92D20

Keywords: