January 14-18, 2008
Protein motions, ranging from molecular flexibility to large-scale
conformational change, play an essential role in many biochemical
processes. For example, some devastating diseases such as Alzheimer's
and bovine spongiform encephalopathy (Mad Cow) are associated with
the misfolding of proteins. Despite the explosion in our knowledge of
structural and functional data, our understanding of protein movement
is still very limited because it is difficult to measure experimentally
and computationally expensive to simulate.
In this talk we describe a method we have developed for modeling protein
motions that is based on probabilistic roadmap methods (PRM) for motion
planning. Our technique yields an approximate map of a protein's potential
energy landscape and can be used to generate transitional motions of a
protein to the native state from unstructured conformations or between
specified conformations. We also describe new analysis tools that enable
us to extract kinetics information, such as folding rates or to identify
and study the folding core. For example, we show how our map-based tools
for modeling and analyzing folding landscapes can capture subtle folding
differences between protein G and its mutants, NuG1 and NuG2. More
information regarding our work, including an archive of protein motions
generated with our technique, are available from our protein folding server:
Chemical reaction systems with a low to moderate number of molecules are typically modeled as continuous time Markov chains. More explicitly, the state of the system is modeled as a vector giving the number of molecules of each species present with each reaction modeled as a possible transition for the state. The model for the kth reaction is determined by a vector of inputs specifying the number of molecules of each chemical species that are consumed in the reaction, a vector of outputs specifying the number of molecules of each species that are created in the reaction and a function of the state that gives the rate at which the reaction occurs. To understand how the probability distribution of the system changes in time one could attempt to solve the Chemical Master Equation (CME), however this is typically an extremely difficult task. Therefore, simulation methods such as the Stochastic Simulation Algorithm (Gillespie Algorithm) and tau-leaping have been developed so as to approximate the probability distribution of the system via Monte Carlo methods. I will demonstrate how using a random time change representation for these models leads naturally to simulation methods that achieve greater efficiency and stability than existing methods.
Due to the inherent complexity of the associated problems,
investigations of the basic principles of protein folding and evolution
are usually restricted to simplified protein models.
Our group has developed methods and programs for exact and complete solving of
problems typical for studies using HP-like 3D lattice protein models.
Addressed tasks are the prediction of globally optimal and listing of
suboptimal structures, sequence design, neutral network exploration, and
degeneracy computation. The used methods are based on fast and
non-heuristic techniques (constraint programming) instead of following
stochastic approaches, which are not capable of answering many fundamental
questions. Thus, we are able to find optimal structure for HP-sequences of length
greater than 200, including a proof of optimality. We have used these methods to
find unique folding sequences, to investigate neutral nets and to design low-degenerated
sequences for given structures.
Two groups of studies recently proved to provide insights into such intrinsic, structure-induced effects: elastic network models that permit us to visualize the cooperative changes in conformation that are most readily accessible near native state conditions, and information-theoretic approaches that elucidate the most efficient pathways of signal transmission favored by the overall architecture. Using a combination of these two approaches, we highlight, by way of application to the bacterial chaperonin complex GroEL-GroES, how the most cooperative modes of motion play a role in mediating the propagation of allosteric signals. A functional coupling between the global dynamics sampled under equilibrium conditions and the signal transduction pathways inherently favored by network topology appears to control allosteric effects.
Continuum electrostatics methods have become increasingly popular due
to their ability to provide approximate descriptions of solvation
energies and forces without expensive sampling required by explicit
solvent models. In particular, the Poisson-Boltzmann equation (PBE)
provides electrostatic potentials, solvation energies, and forces by
modeling the solvent as a featureless, dielectric material, and the
mobile ions as a continuous distribution of charge. In this talk, I
will provide a review of PBE-based and new apolar continuum solvation
methods as well as approaches for assessing their performance by
comparison with explicit solvent simulations. In particular, I will
focus on the ability of these continuum solvent models to describe
solvation forces on proteins and nucleic acids and will comment on
strengths and weaknesses of these implicit solvent approaches.
Many small single-domain proteins undergo cooperative, switch-like
folding/unfolding transitions with very low populations of intermediate,
i.e., partially folded, conformations. The phenomenon of cooperative folding
is not readily accounted for by common notions about driving forces for
folding. I will discuss how common protein chain models with pairwise
additive interactions are insufficient to account for the folding
cooperativity of natural proteins, and how models with nonadditive
local-nonlocal coupling may rationalize cooperative folding rates that
are well correlated with native topology. The traditional formulation
of folding transition states entails a macroscopic folding free energy
barrier with both enthalpic and entropic components. I will explore the
microscopic origins of these thermodynamic signatures in terms of
conformational entropy as well as desolvation (dewetting) effects.
Notably, the existence of significant enthalpic folding barriers
raises fundamental questions about the validity of the funnel picture of
protein folding, because such enthalpic barriers appear to imply that
there are substantial uphill moves along a microscopic folding trajectory.
Using results from extensive atomic simulations, I will show how the
paradox can be resolved by a dramatic entropy-enthalpy compensation
at the rate-limiting step of folding. In this perspective, the height
of the enthalpic barrier is seen as related to the degree of cooperativity
of the folding process.
Multivalent ions (Mg2+) in RNA tertiary structure folding can be strongly correlated and thus cannot be treated by mean-field theories such as the Poisson-Boltzmann equation. We recently developed a statistical mechanical model (TBI) to account for ion correlation by considering ensemble of discrete ion distributions. Experimental tests show that the TBI model gives improved predictions for nucleic folding folding stability over the Poisson-Boltzmann equation, which generally underestimates the (multivalent) ion-dependent folding stability due to ignoring the ion correlation. Using the TBI theory, we investigate the folding energy landscape for a simple system with loop-tethered short DNA helices and find that Na+ and Mg2+ play contrasting roles in helix–helix assembly. High [Na+] (>0.3 M) causes a reduced helix–helix electrostatic repulsion and a subsequent disordered packing of helices, while Mg2+ of concentration > 1 mM is predicted to induce a more compact and ordered helix–helix packing. Mg2+ is much more efficient in causing nucleic acid compaction and is predicted to induce a collapse transition around 1mM of [Mg2+].
Lead generation is a major hurdle in small-molecule drug discovery, with an estimated 60% of projects failing from lack of lead matter or difficulty in optimizing leads for drug-like properties. It would be valuable to identify these less-druggable targets before incurring substantial expenditure and effort. We discovered that a model-based approach using basic biophysical principles yields good prediction of druggability based solely on the crystal structure of the target binding site. We quantitatively estimate the maximal affinity achievable by a drug-like molecule, and we show that these calculated values correlate with drug discovery outcomes. We experimentally test two predictions using high-throughput screening of a diverse compound collection. The collective results highlight the utility of our approach as well as strategies for tacking difficult targets. I will also discuss our approach to calculating protein curvature and some potential computational approaches for difficult targets.
This talk (and a related poster) describes Lie-group-theoretic techniques that can be applied in the analysis and modeling of protein conformations. Three topics are covered: (1) Conformational transitions between two known end states; (2) proper normalization of helix-helix crossing angle data in the PDB; (3) models of the conformational entropy of the ensemble of unfolded polypeptide conformations. Using the concept of convolution on the group of rigid-body motions, the probability density of position and orientation of the distal end of a polypeptide chain is obtained by convolving the distributions for shorter segments that make up the chain. This methodology can also be used in the analysis of loop entropy in folded proteins as well as the ensemble of unfolded conformations.
The geometrical problem of protein folding, especially in its later stages, is composed of two types of freedom, the full torsional flexibility of loops connecting nearly rigid structural pieces (helices, beta-sheets etc), and the relative placing of such pieces. We present a method for sampling the feasible conformations of protein loops, based on Triaxial Loop Closure (TLC), a simple and highly efficient inverse kinematic (IK) method for solving the loop closure problem. TLC is easily extended to incorporate additional (i.e. position, orientation) constraints, or more general geometrical conditions. Due to its relative simplicity TLC compares favorably to more general IK robotics algorithms, both in robustness and in speed. We consider two applications: (i) An algorithm for the rapid sampling of the conformations of protein loops including three or more residues which uses quasirandom Sobol sampling of the Ramachandran regions. Ideas akin to Delauney triangulation may be employed to ensure sampling loop shapespace at a desired density. (ii) An efficient method for the sequential assembly of helical proteins via
maximal hydrophobic packing. The geometrical problem of considering all
possible mutual arrangements of a system of helices that are compatible with closing the corresponding loops is already too large to sample directly. We introduced a measure of hydrophobic packing by seeking to minimize the radius of gyration of the hydrophobic residues. Thus, we sequentially assemble the helices, by sampling relative orientations of pairs of them that bring specified hydrophobic residues in proximity. For the best candidates, in terms of energy and hydrophobig radius of gyration, the loops are closed using the algorithm in (i) and another helix is added to the assembly, always seeking to maximizing hydrophobic contact. We tested this iterative assembly method on 26 helical proteins each containing up to 5 helices. The method heavily samples native-like conformations. The average RMSD-to-native of the best conformations for the 18 helix bundle proteins that have 2 or 3 helices is less than 2 Angstroms with slightly worse errors for proteins containing more helices.
Sequence-structure relationships in proteins are highly asymmetric since
many sequences fold into relatively few structures. What is the number of
sequences that fold into a particular protein structure? Is it possible to
switch between stable protein folds by point mutations? To address these
questions we compute a directed graph of sequences and structures of
proteins, which is based on experimentally determined protein shapes. Two
thousand and sixty experimental structures from the Protein Data Bank were
considered, providing a good coverage of fold families. The graph is
computed using an energy function that measures stability of a sequence in a
fold. A node in the graph is an experimental structure (and the
computationally matching sequences). A directed and weighted edge between
nodes A and B is the number of sequences of A that switch to B because the
energy of B is lower. The directed graph is highly connected at native
energies with ³sinks² that attract many sequences from other folds. The
sinks are rich in beta sheets. The in-degrees of a particular protein shape
correlates with the number of sequences that matches this shape in
empirically determined genomes. Properties of strongly connected components
of the graph are correlated with protein length and secondary structure.
Joint work with Leonid Meyerguz and Jon Kleinberg