Docking¶
Thanks to advances in biophysics, we are often able to find the structure of proteins from experimental techniques like CryoEM or Xray crystallography. These structures can be powerful aides in designing small molecules. The technique of Molecular docking performs geometric calculations to find a “binding pose” with the small molecule interacting with the protein in question in a suitable binding pocket (that is, a region on the protein which has a groove in which the small molecule can rest). For more information about docking, check out the Autodock Vina paper:
Trott, Oleg, and Arthur J. Olson. “AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.” Journal of computational chemistry 31.2 (2010): 455461.
Binding Pocket Discovery¶
DeepChem has some utilities to help find binding pockets on proteins automatically. For now, these utilities are simple, but we will improve these in future versions of DeepChem.

class
deepchem.dock.binding_pocket.
BindingPocketFinder
[source]¶ Abstract superclass for binding pocket detectors
Many times when working with a new protein or other macromolecule, it’s not clear what zones of the macromolecule may be good targets for potential ligands or other molecules to interact with. This abstract class provides a template for child classes that algorithmically locate potential binding pockets that are good potential interaction sites.
Note that potential interactions sites can be found by many different methods, and that this abstract class doesn’t specify the technique to be used.

__init__
¶ Initialize self. See help(type(self)) for accurate signature.


class
deepchem.dock.binding_pocket.
ConvexHullPocketFinder
(scoring_model: Optional[deepchem.models.models.Model] = None, pad: float = 5.0)[source]¶ Implementation that uses convex hull of protein to find pockets.
Based on https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112621/pdf/147268071418.pdf

__init__
(scoring_model: Optional[deepchem.models.models.Model] = None, pad: float = 5.0)[source]¶ Initialize the pocket finder.
Parameters:  scoring_model (Model, optional (default None)) – If specified, use this model to prune pockets.
 pad (float, optional (default 5.0)) – The number of angstroms to pad around a binding pocket’s atoms to get a binding pocket box.

find_all_pockets
(protein_file: str) → List[deepchem.utils.coordinate_box_utils.CoordinateBox][source]¶ Find list of binding pockets on protein.
Parameters: protein_file (str) – Protein to load in. Returns: List of binding pockets on protein. Each pocket is a CoordinateBox Return type: List[CoordinateBox]

find_pockets
(macromolecule_file: str) → List[deepchem.utils.coordinate_box_utils.CoordinateBox][source]¶ Find list of suitable binding pockets on protein.
This function computes putative binding pockets on this protein. This class uses the ConvexHull to compute binding pockets. Each face of the hull is converted into a coordinate box used for binding.
Parameters: macromolecule_file (str) – Location of the macromolecule file to load Returns: List of pockets. Each pocket is a CoordinateBox Return type: List[CoordinateBox]

Pose Generation¶
Pose generation is the task of finding a “pose”, that is a geometric configuration of a small molecule interacting with a protein. Pose generation is a complex process, so for now DeepChem relies on external software to perform pose generation. This software is invoked and installed under the hood.

class
deepchem.dock.pose_generation.
PoseGenerator
[source]¶ A Pose Generator computes low energy conformations for molecular complexes.
Many questions in structural biophysics reduce to that of computing the binding free energy of molecular complexes. A key step towards computing the binding free energy of two complexes is to find low energy “poses”, that is energetically favorable conformations of molecules with respect to each other. One application of this technique is to find low energy poses for proteinligand interactions.

__init__
¶ Initialize self. See help(type(self)) for accurate signature.

generate_poses
(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, generate_scores: bool = False)[source]¶ Generates a list of low energy poses for molecular complex
Parameters:  molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
 centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
 box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
 exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.
 num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.
 num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
 out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
 generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.
Returns: Return type: A list of molecular complexes in energetically favorable poses.


class
deepchem.dock.pose_generation.
VinaPoseGenerator
(sixty_four_bits: bool = True, pocket_finder: Optional[deepchem.dock.binding_pocket.BindingPocketFinder] = None)[source]¶ Uses Autodock Vina to generate binding poses.
This class uses Autodock Vina to make make predictions of binding poses. It downloads the Autodock Vina executable for your system to your specified DEEPCHEM_DATA_DIR (remember this is an environment variable you set) and invokes the executable to perform pose generation for you.
Note
This class requires RDKit to be installed.

__init__
(sixty_four_bits: bool = True, pocket_finder: Optional[deepchem.dock.binding_pocket.BindingPocketFinder] = None)[source]¶ Initializes Vina Pose Generator
Parameters:  sixty_four_bits (bool, optional (default True)) – Specifies whether this is a 64bit machine. Needed to download the correct executable.
 pocket_finder (BindingPocketFinder, optional (default None)) – If specified should be an instance of dc.dock.BindingPocketFinder.

generate_poses
(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, generate_scores: bool = False) → Union[Tuple[List[Tuple[Any, Any]], List[float]], List[Tuple[Any, Any]]][source]¶ Generates the docked complex and outputs files for docked complex.
TODO: How can this work on Windows? We need to install a .msi file and invoke it correctly from Python for this to work.
Parameters:  molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
 centroid (np.ndarray, optional) – The centroid to dock against. Is computed if not specified.
 box_dims (np.ndarray, optional) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
 exhaustiveness (int, optional (default 10)) – Tells Autodock Vina how exhaustive it should be with pose generation.
 num_modes (int, optional (default 9)) – Tells Autodock Vina how many binding modes it should generate at each invocation.
 num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
 out_dir (str, optional) – If specified, write generated poses to this directory.
 generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.
Returns: Tuple of (docked_poses, scores) or docked_poses. docked_poses is a list of docked molecular complexes. Each entry in this list contains a (protein_mol, ligand_mol) pair of RDKit molecules. scores is a list of binding free energies predicted by Vina.
Return type: Tuple[docked_poses, scores] or docked_poses
Raises: ValueError if num_pockets is set but self.pocket_finder is None.

Docking¶
The dc.dock.docking
module provides a generic docking
implementation that depends on provide pose generation and pose
scoring utilities to perform docking. This implementation is generic.

class
deepchem.dock.docking.
Docker
(pose_generator: deepchem.dock.pose_generation.PoseGenerator, featurizer: Optional[deepchem.feat.base_classes.ComplexFeaturizer] = None, scoring_model: Optional[deepchem.models.models.Model] = None)[source]¶ A generic molecular docking class
This class provides a docking engine which uses provided models for featurization, pose generation, and scoring. Most pieces of docking software are command line tools that are invoked from the shell. The goal of this class is to provide a python clean API for invoking molecular docking programmatically.
The implementation of this class is lightweight and generic. It’s expected that the majority of the heavy lifting will be done by pose generation and scoring classes that are provided to this class.

__init__
(pose_generator: deepchem.dock.pose_generation.PoseGenerator, featurizer: Optional[deepchem.feat.base_classes.ComplexFeaturizer] = None, scoring_model: Optional[deepchem.models.models.Model] = None)[source]¶ Builds model.
Parameters:  pose_generator (PoseGenerator) – The pose generator to use for this model
 featurizer (ComplexFeaturizer, optional (default None)) – Featurizer associated with scoring_model
 scoring_model (Model, optional (default None)) – Should make predictions on molecular complex.

dock
(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, use_pose_generator_scores: bool = False) → Union[Generator[Tuple[Any, Any], None, None], Generator[Tuple[Tuple[Any, Any], float], None, None]][source]¶ Generic docking function.
This docking function uses this object’s featurizer, pose generator, and scoring model to make docking predictions. This function is written in generic style so
Parameters:  molecular_complex (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
 centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
 box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
 exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.
 num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.
 num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
 out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
 use_pose_generator_scores (bool, optional (default False)) – If True, ask pose generator to generate scores. This cannot be True if self.featurizer and self.scoring_model are set since those will be used to generate scores in that case.
Returns: A generator. If use_pose_generator_scores==True or self.scoring_model is set, then will yield tuples (posed_complex, score). Else will yield posed_complex.
Return type: Generator[Tuple[posed_complex, score]] or Generator[posed_complex]

Pose Scoring¶
This module contains some utilities for computing docking scoring functions directly in Python. For now, support for custom pose scoring is limited.

deepchem.dock.pose_scoring.
pairwise_distances
(coords1: numpy.ndarray, coords2: numpy.ndarray) → numpy.ndarray[source]¶ Returns matrix of pairwise Euclidean distances.
Parameters:  coords1 (np.ndarray) – A numpy array of shape (N, 3)
 coords2 (np.ndarray) – A numpy array of shape (M, 3)
Returns: A (N,M) array with pairwise distances.
Return type: np.ndarray

deepchem.dock.pose_scoring.
cutoff_filter
(d: numpy.ndarray, x: numpy.ndarray, cutoff=8.0) → numpy.ndarray[source]¶ Applies a cutoff filter on pairwise distances
Parameters:  d (np.ndarray) – Pairwise distances matrix. A numpy array of shape (N, M)
 x (np.ndarray) – Matrix of shape (N, M)
 cutoff (float, optional (default 8)) – Cutoff for selection in Angstroms
Returns: A (N,M) array with values where distance is too large thresholded to 0.
Return type: np.ndarray

deepchem.dock.pose_scoring.
vina_nonlinearity
(c: numpy.ndarray, w: float, Nrot: int) → numpy.ndarray[source]¶ Computes nonlinearity used in Vina.
Parameters:  c (np.ndarray) – A numpy array of shape (N, M)
 w (float) – Weighting term
 Nrot (int) – Number of rotatable bonds in this molecule
Returns: A (N, M) array with activations under a nonlinearity.
Return type: np.ndarray

deepchem.dock.pose_scoring.
vina_repulsion
(d: numpy.ndarray) → numpy.ndarray[source]¶ Computes Autodock Vina’s repulsion interaction term.
Parameters: d (np.ndarray) – A numpy array of shape (N, M). Returns: A (N, M) array with repulsion terms. Return type: np.ndarray

deepchem.dock.pose_scoring.
vina_hydrophobic
(d: numpy.ndarray) → numpy.ndarray[source]¶ Computes Autodock Vina’s hydrophobic interaction term.
Here, d is the set of surface distances as defined in [1]_
Parameters: d (np.ndarray) – A numpy array of shape (N, M). Returns: A (N, M) array of hydrophoboic interactions in a piecewise linear curve. Return type: np.ndarray References
[1] Jain, Ajay N. “Scoring noncovalent proteinligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computeraided molecular design 10.5 (1996): 427440.

deepchem.dock.pose_scoring.
vina_hbond
(d: numpy.ndarray) → numpy.ndarray[source]¶ Computes Autodock Vina’s hydrogen bond interaction term.
Here, d is the set of surface distances as defined in [1]_
Parameters: d (np.ndarray) – A numpy array of shape (N, M). Returns: A (N, M) array of hydrophoboic interactions in a piecewise linear curve. Return type: np.ndarray References
[1] Jain, Ajay N. “Scoring noncovalent proteinligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computeraided molecular design 10.5 (1996): 427440.

deepchem.dock.pose_scoring.
vina_gaussian_first
(d: numpy.ndarray) → numpy.ndarray[source]¶ Computes Autodock Vina’s first Gaussian interaction term.
Here, d is the set of surface distances as defined in [1]_
Parameters: d (np.ndarray) – A numpy array of shape (N, M). Returns: A (N, M) array of gaussian interaction terms. Return type: np.ndarray References
[1] Jain, Ajay N. “Scoring noncovalent proteinligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computeraided molecular design 10.5 (1996): 427440.

deepchem.dock.pose_scoring.
vina_gaussian_second
(d: numpy.ndarray) → numpy.ndarray[source]¶ Computes Autodock Vina’s second Gaussian interaction term.
Here, d is the set of surface distances as defined in [1]_
Parameters: d (np.ndarray) – A numpy array of shape (N, M). Returns: A (N, M) array of gaussian interaction terms. Return type: np.ndarray References
[1] Jain, Ajay N. “Scoring noncovalent proteinligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computeraided molecular design 10.5 (1996): 427440.

deepchem.dock.pose_scoring.
vina_energy_term
(coords1: numpy.ndarray, coords2: numpy.ndarray, weights: numpy.ndarray, wrot: float, Nrot: int) → numpy.ndarray[source]¶ Computes the Vina Energy function for two molecular conformations
Parameters:  coords1 (np.ndarray) – Molecular coordinates of shape (N, 3)
 coords2 (np.ndarray) – Molecular coordinates of shape (M, 3)
 weights (np.ndarray) – A numpy array of shape (5,). The 5 values are weights for repulsion interaction term, hydrophobic interaction term, hydrogen bond interaction term, first Gaussian interaction term and second Gaussian interaction term.
 wrot (float) – The scaling factor for nonlinearity
 Nrot (int) – Number of rotatable bonds in this calculation
Returns: A scalar value with free energy
Return type: np.ndarray