Docking Protocol (RosettaDock)
Metadata
Authors: Brian Weitzner (brian.weitzner@jhu.edu), Monica Berrondo (mberron1@jhu.edu), Krishna Kilambi (kkpraneeth@jhu.edu), Robin Thottungal (raugust1@jhu.edu), Sidhartha Chaudhury (sidc@jhu.edu), Chu Wang (chuwang@gmail.com), Jeffrey Gray (jgray@jhu.edu)
Last edited 7/18/2011. Corresponding PI Jeffrey Gray (jgray@jhu.edu).
Code and Demo
- Application source code:
rosetta/rosetta_source/src/apps/public/docking/docking_protocol.cc
- Main mover source code:
rosetta/rosetta_source/src/protocols/docking/DockingProtocol.cc
- To see demos of some different use cases see integration tests located in
rosetta/rosetta_tests/integration/docking*
(docking_full_protocol, docking_local_refine, docking_local_refine_min, docking_low_res, docking_distance_constraint, docking_site_constraint).
To run docking, type the following in a commandline:
[path to executable]/docking_protocol.[platform|linux/mac][compile|gcc/ixx]release –database [path to database] @options
Note: these demos will only generate one decoy. To generate a large number of decoys you will need to add –nstruct N (where N is the number of decoys to build) to the list of flags.
References
We recommend the following articles for further studies of RosettaDock methodology and applications:
- Gray, J. J.; Moughon, S.; Wang, C.; Schueler-Furman, O.; Kuhlman, B.; Rohl, C. A.; Baker, D., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of Molecular Biology 2003, 331, (1), 281-299.
- Wang, C., Schueler-Furman, O., Baker, D. (2005). Improved side-chain modeling for protein-protein docking Protein Sci 14, 1328-1339.
- Wang, C., Bradley, P. and Baker, D. (2007) Protein-protein docking with backbone flexibility. Journal of Molecular Biology, 2007 Oct 19;373(2):503-19. Epub 2007 Aug 2.
- Chaudhury, S., Berrondo, M., Weitzner, B. D., Muthu, P., Bergman, H., Gray, J. J.; (2011) Benchmarking and analysis of protein docking performance in RosettaDock v3.2. PLoS One, Accepted for Publication
Application purpose
Determine the structure of protein-protein complexes by using rigid body perturbations of the protein chains.
Algorithm
The following description has been adapted from Chaudhury, et al., 2011, PLoS One:
RosettaDock is a Monte Carlo (MC) based multi-scale docking algorithm that incorporates both a low-resolution, centroid-mode, coarse-grain stage and a high-resolution, all-atom refinement stage that optimizes both rigid-body orientation and side-chain conformation. The algorithm roughly follows the biophysical theory of an encounter complex formation followed by a transition to a bound state. Typically the algorithm starts from either a random initial orientation of the two partners (global docking), or an initial orientation that is randomly perturbed from a user-defined starting pose (local perturbation). From there, the partner proteins are represented coarsely, where side chains are replaced by a single unified pseudo-atom, or centroid. During this phase, a 500-step Monte Carlo search is made with adaptive rotation and translational steps adjusted dynamically to achieve an acceptance rate of 25%. The ScoreFunction used in this stage primarily consists of a ‘bump’ term, a contact term, and docking-specific statistical residue environment and residue-residue pair-wise potentials.
Once the centroid-mode stage is complete, the lowest energy structure accessed during that stage is selected for high-resolution refinement. During high-resolution refinement, centroid pseudo-atoms are replaced with the side-chain atoms at their initial unbound conformations. Then 50 Monte Carlo+Minimization (MCM) steps are made in which:
- The rigid-body position is perturbed by a random direction and magnitude specified by a Gaussian distribution around 0.1 Å and 3.0˚
- The rigid-body orientation is energy-minimized
- The side-chain conformations are optimized with RotamerTrials, followed by a test of the Metropolis criteria.
Every eight steps, an additional combinatorial side-chain optimization is carried out using the full side-chain packing algorithm, followed by an additional Metropolis criteria check. To reduce the time devoted to the computationally expensive energy-minimization for unproductive rigid-body moves, minimization is skipped if a rigid-body move results in a change in score of greater than +15. The all-atom score function used in this stage primarily consists of Van der Waals attractive and repulsive terms, a solvation term, an explicit hydrogen bonding term, a statistical residue-residue pair-wise interaction term, an internal side-chain conformational energy term, and an electrostatic term.
For particular targets, a variety of RosettaDock sampling strategies are often used to improve the chance of achieving an accurate structure prediction. If no prior structural or biochemical information is known about the protein interaction of interest, global docking is used to randomize the initial docking poses. From there, score filters and clustering are used to identify clusters of acceptable low-energy structures for further docking and refinement. In most cases, there is some known information about the complex, either in the form of related protein complexes or in biochemical or bioinformatics data which identify probable regions of interaction on the protein partners. In these cases users manually arrange the starting docking pose to a configuration that is compatible with the information and carry out a local docking perturbation. Additionally, users can set distance-based filters that bias sampling towards those docking poses that are compatible with specified constraints.
Modes
- Full protocol - The full protocol involved first exploring the conformational space of the partners through a low resolution search followed by all-atom local refinement through a Monte Carlo+Minimization algorithm.
- Low Resolution Docking Only - The protein is represented as backbone plus centroid representation of sidechains, i.e. the sidechain is represented as one giant atom to save CPU time. In this stage RosettaDock attempts to find the rough orientation of the docking partners for the high-resolution search.
- High Resolution Docking Only (Local Refinement) - All atoms in the protein are represented, and the position found in the low-resolution search is optimized. Rigid body MCM is alternated with sidechain repacking so that the sidechains can adjust to a new, more favorable orientation and vice versa. The high-resolution stage uses up the most CPU time of Rosetta.
- Local High Resolution Minimization - Similar to Local refinement, all atoms in the protein are represented. A single run of minimization, as opposed to a series of Monte Carlo+Minimization cycles, is done to minimize the energy of the rigid body position. Minimization is followed by side-chain packing.
- Global or local docking can be achieved by using different starting perturbation flags to generate a starting structure from the input structure.
Input Files
The only required input file is a prepacked pdb file containing two proteins with different chain IDs. Starting structures must be prepacked because the side chains are only packed at the interface during docking. Running docking prepack protocol ensures that the side chains outside of the docking interface have a low energy conformation which provides a better reference state for scoring. For more information on prepacking, see the Docking Prepack protocol documentation.
Note: The following flags should be given to every docking simulation: -ex1 -ex2aro.
If you are using a starting structure with more than two polypeptide chains, you should include the -partners flag. If this flag is omitted, docking will dock the first two polypeptide chains in the strucutre.
Standard Docking options
Basic protocol options
Flag | Description | Type |
---|---|---|
-partners [P1_P2] | Defines docking partners by chain ID for multichain docking. For example, "-partners LH_A" moves chain A around the dimer of chains L and H. | String |
Starting Perturbation Flags
Flag | Description | Type |
---|---|---|
-randomize1 | Randomize the orientation of the first docking partner. (Only works with 2 partner docking). | Boolean |
-randomize2 | Randomize the orientation of the second docking partner. (Only works with 2 partner docking). | Boolean |
-spin | Spin a second docking partner around axes from center of mass of the first partner to the second partner. | Boolean |
-dock_pert [T] [R] | To create a starting strucutre from the input structure, randomly perturb the input structure using a gaussian for translation and rotation with standard deviations [T] and [R]. Recommended usage is "-dock_pert 3 8" | RealVector |
-uniform_trans [R] | Uniform random repositioning of the second partner about the first partner within a sphere of the given radius, [R]. | Real |
Packing Flags
Flag | Description | Type |
---|---|---|
-norepack1 | Do not repack the sidechains on the first docking partner. (Only works with 2 partner docking). | Boolean |
-norepack2 | Do not repack the sidechains on the second docking partner. | Boolean |
-sc_min | Perform extra side chain minimization steps during packing steps. | Boolean |
Full Protocol Flags
Default mode of docking. No additional flags necessary.
Low Resolution Docking Only Flags
Flag | Description | Type |
---|---|---|
-low_res_protocol_only | Only run the low resolution part of the protocol (skips all high resolution steps and only outputs low resolution structure). | Boolean |
High Resolution Docking Only Flags
Flag | Description | Type |
---|---|---|
-docking_local_refine | Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. | Boolean |
Local High Resolution Minimization Flags
Flag | Description | Type |
---|---|---|
-docking_local_refine | Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. | Boolean |
-dock_min | Does a single round of minimization in high resolution, skipping the mcm protocol. | Boolean |
Relevant common Rosetta Flags
Flag | Description | Type |
---|---|---|
-s [S] OR -silent [S] | Specify the file name of the starting structure, S. | String |
-native [S] | Specify the file name of the native structure, [S], for which to compare in RMSD calculations. If a native file is not passed in, all calculations are done using the starting structure as native. | String |
-nstruct [I] | Specify the number of decoys, [I], to generate. | Integer |
-database [P] | The path to the Rosetta database (e.g. ~/rosetta_database). | String |
-use_input_sc | Use accepted rotamers from the input structure between Monte Carlo+Minimization (MCM) cycles. Unlike the -unboundrot flag from Rosetta++, not all rotamers from the input structure are added each time to the rotamer library, but only those accepted at the end of each round the remaining conformations are lost. | Boolean |
-ex1/-ex1aro -ex2/-ex2aro -ex3 -ex4 | Adding extra side-chain rotamers (highly recommended). The -ex1 and -ex2aro flags were used in our own tests, and therefore are recommended as default values. | Boolean/Integer |
-constraints:cst_file [S] | Specify the name of the constraint file, [S]. | String |
Expert Flags
Flag | Description | Type |
---|---|---|
-dock_mcm_trans_magnitude [T] | The magnitude of the translational perturbation during MCM steps in docking in Angstroms. Defaults to 0.7 Å | Real |
-dock_mcm_rot_magnitude [R] | The magnitude of the rotational perturbation during MCM steps in docking in degress. Defaults to 5.0º | Real |
-docking_centroid_outer_cycles [C] | Number of cycles to use in outer loop of docking low resolution protocol. Defaults to 10. | Integer |
-docking_centroid_inner_cycles [C] | Number of cycles to use in inner loop of docking low resolution protocol. Defaults to 50. | Integer |
Tips
The docking run can take two forms. Sometimes biochemical and genetic information can be used to localize the binding site to a small region on one or both partners. In this case, one performs a perturbation run, exploring only a small region of space around the suspected binding site. For predictions where there is no biological information about the interface, one usually performs a global search, exploring all the conformational space of both partners.
Global docking gives best results when there are less than 450 residues in the complex. (see Daily, 2005 )
The most commonly used options are -dock_pert 3 8, which will create starting structures using 3 Angstrom translations and rotations within 8 degrees.
Note that one should always run the Docking Prepack application on the starting structure in order to remove high-energy rotamers that could cause erroneous decoy ranking.
-
There are two scores that one should inspect:
- total: The total score is an overall measure of the energy of the complex
- I_sc (interface score): I_sc is the total score of the complex minus the total score of each partner in isolation. Typcial values for I_sc of good decoys are in the range of -5 to -10.
Always use the -ex1 and -ex2aro options to allow more fine grained rotamer selection.
For perturbation runs, generate at least 1,000 decoys. For global runs, generate between 10,000 and 100,000 (we know this requires a lot of cpu time and disk space).
-
Docking now supports AtomPairConstraint, AmbiguousConstraint and SiteConstraint. To use a constraint with docking, you only need to add the option -constraints:cst_file constraint_file . See the docking_distance_constraint and docking_site_constraint integration tests for examples. A SiteConstraint allows you to specify that a particular residue should be in contact with a particular chain. An example of a SiteConstraint is:
SiteConstraint CA 4A D FLAT_HARMONIC 0 1 5
This will add a FLAT_HARMONIC potential with the parameters 0 1 5 (recommended; see this page for more on constraint files) around the distance between the CA of residue 4 (PDB numbering) on chain A and the closest CA on chain D to the ScoreFunction.
Expected Outputs
One PDB file for each candidate docked model generated and a 1 scorefile for each run summarizing all generated models.
Post Processing
Sort scorefile by score using commandline sort function. For global docking simualations, one should generate at least 10,000 decoys and ideally 100,000 decoys should be produced. Sort by score (pay attention to I_sc and total!) and cluster the top 200 decoys by pairwise RMSD. Since global docking in Rosetta 3 has not been thoroughly tested, we do not have scripts available to automate this process. We recommend using the scripts as mentioned in our Rosetta++ tutorial . Some scripts may require some modification.
New things since last release
- Supports the modern job distributor (jd2).
- Support for complex foldtrees including poses that have ligands.
- Support for constraints .