Research Area Summary:

We investigate energetics, structures, thermodynamics, and kinetics of biomolecules via computational tools. Of particular interest are first-principles calculations of biomolecular interactions, structure prediction of multiprotein complexes, thermodynamics, and kinetic properties of protein-protein association. We build computational tools to simulate biomolecular interactions.


Coarse-grained modeling of multiprotein complexes

Figure 1. Structure of ubiquitin-UIM1 complex. Ubiquitin is shown in red, while experimental structure of UIM1 is shown in blue. The lowest energy structure of UIM1 from the coarse-grained model simulation is shown in green.
Figure 1. Structure of ubiquitin-UIM1 complex. Ubiquitin is shown in red, while experimental structure of UIM1 is shown in blue. The lowest energy structure of UIM1 from the coarse-grained model simulation is shown in green.

Protein-protein interactions play a key role in many important biological processes. For proteins to function properly, they need to bind to their appropriate binding partners with the correct binding mode. One of the biggest challenges in computational biophysics is to predict bound structures of protein complexes, given the structures of individual proteins. Several models, ranging from atomic level to coarse-grained ones, have shown limited success. Furthermore, computing binding affinities of protein complexes has been a challenge.

Recently it has been shown that proteins form weakly binding transient encounter complexes before they form a tight native complex. Such complexes are formed mostly by the long-range electrostatic interactions and are thought to enhance the formation of a native complex, since they reduce the dimension of search from three to two. In addition, many proteins contain intrinsically disordered regions that are difficult to resolve via experiments. In this case, the characterization of a single structure is not so meaningful. Instead, an ensemble approach is more appropriate.

Here we develop a residue-level coarse-grained model to study structures and dynamics of multiprotein complexes. Each residue is represented as a sphere centered about the corresponding alpha-carbon. A potential energy function is developed from knowledge-based contact potentials, while intrisincally disordered regions are modeled as flexible polymeric beads with appropriate bond, angle, and torsion potentials. We have shown that this coarse-grained model is sufficient to provide thermodynamics and structures of weakly binding protein complexes with binding affinities larger than 1 microM.

The model has wider applications in determining structures and dynamics of multiprotein assemblies. It can be combined with low-resolution experimental data, such as cryoEM and SAXS, to obtain high resolution structures of protein assemblies with intrinsically disordered regions. It can also be applied to study cooperative binding of multiprotein complexes and protein-protein interactions in the presence of macromolecules that mimics the environment of living cells.
Principal Investigator: Youngchan Kim


Kernel Energy Method (KEM)

Figure 2. The insulin molecule is composed of two chains, A in blue (shown as two shorter helices) and B in green-red (shown as one longer helix). The whole molecule is divided into five kernels as shown. (The insulin figure was generated by KING Viewer). The full insulin molecule (chains A and B) yields a calculated total energy of EHF =-21104.7660 au. The KEM result, EKEM =-21104.7656 au, differs from this by as little as 0.0004 au.
Figure 2. The insulin molecule is composed of two chains, A in blue (shown as two shorter helices) and B in green-red (shown as one longer helix). The whole molecule is divided into five kernels as shown. (The insulin figure was generated by KING Viewer). The full insulin molecule (chains A and B) yields a calculated total energy of EHF =-21104.7660 au. The KEM result, EKEM =-21104.7656 au, differs from this by as little as 0.0004 au.

The principal molecules of biology and medicinal chemistry are — or are related to — peptides, proteins, DNA, and RNA. It is now possible to use the full power of quantum mechanics to study these molecules. The first-principles approach to large biomolecules has received little attention up to now because of computational difficulties with the number of atoms in the molecules. Since the biochemical molecules of medicinal chemistry are so often large, containing thousands or even tens of thousands of atoms, the computational difficulties of the full quantum problem has been prohibitive. Two things have happened to change this perspective: (1) the advance of parallel supercomputers, and (2) the discovery of a quantum formalism (quantum crystallography and the use of quantum kernels) which are well suited to the use of parallel computation.

Figure 3. Application of KEM to drug design. The efficacy of drugs is based upon a geometrical "lock and key" fit of the drug to the target, complemented by an electronic interaction between the two. The pictures in left to right order indicate, first, an abstract sketch of a drug molecule within a reactive "pocket" of its target. The dashed lines indicate interactions with the various kernels that compose the target. Second is a crystal structure showing drug-target geometry. Third is a ball & stick schema
Figure 3. Application of KEM to drug design. The efficacy of drugs is based upon a geometrical "lock and key" fit of the drug to the target, complemented by an electronic interaction between the two. The pictures in left to right order indicate, first, an abstract sketch of a drug molecule within a reactive "pocket" of its target. The dashed lines indicate interactions with the various kernels that compose the target. Second is a crystal structure showing drug-target geometry. Third is a ball and stick schematic representation of the drug-target interactions shown as dashed lines. Possible targets are indicated, along with the equation which defines the interaction energy. The KEM delivers the ab-initio quantum mechanical interaction energy between the drug and its target.

Recent applications of these advances are the calculation of the quantum mechanical ab-initio molecular energy of peptides, the protein insulin, DNA, RNA, virus, and ribosome. The results were found to have high accuracy, although the computational difficulty of representing a molecule increases only modestly with the number of atoms. The calculations are simplified by adopting the approximation that a full biological molecule can be represented by smaller "kernels" of atoms. The use of kernels makes it possible for quantum mechanics to be applied to the molecules of medicinal chemistry. Thus, problems of medicinal chemistry, such as rational design of drugs, protein folding, and the rational design of proteins, may be illuminated by the use of quantum mechanical analysis. So far KEM has been applied to several molecules, including insulin and vesicular stomatitis virus nucleoprotein.
Principal Investigator: Lulu Huang


Transient Encounter Complexes:

Figure 4. Transient encounter complexes of Hpr-E1N complex. E1N is shown in grey (with red spots indicating binding sites of Hpr). The experimental structure of Hpr is shown in green, while the simulated structures in yellow (near the native binding site) and blue (far from the native binding site). Three panels show the structures from different angles.
Figure 4. Transient encounter complexes of Hpr-E1N complex. E1N is shown in grey (with red spots indicating binding sites of Hpr). The experimental structure of Hpr is shown in green, while the simulated structures in yellow (near the native binding site) and blue (far from the native binding site). Three panels show the structures from different angles.

Recent NMR experiments of binary protein complexes and DNA-protein complexes show that a small population (~10%) of transient encounter complexes are present in solution in addition to native complexes. It was speculated that proteins form these transient encounter complexes on the way to the formation of native complexes. However, resolving structures of such complexes and understanding their roles in the protein-binding process is difficult for both experiment and simulation. We have undertaken Monte Carlo simulations of the Hpr-E1N complex to visualize the transient encounter complexes. Using NMR paramagnetic relaxation enhancement data, we were able to refine the simulated structures to reproduce the experimental NMR data. The results show that coarse-grained simulations, together with high resolution NMR data, can enhance the visualization of the transient encounter complexes which are difficult to resolve with conventional X-ray or computational models. The role of these complexes, if any, in protein binding requires further study.
Principal Investigator: Youngchan Kim