Docking

Thanks to advances in biophysics, we are often able to find the structure of proteins from experimental techniques like Cryo-EM or X-ray crystallography. These structures can be powerful aides in designing small molecules. The technique of Molecular docking performs geometric calculations to find a “binding pose” with the small molecule interacting with the protein in question in a suitable binding pocket (that is, a region on the protein which has a groove in which the small molecule can rest). For more information about docking, check out the Autodock Vina paper:

Trott, Oleg, and Arthur J. Olson. “AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.” Journal of computational chemistry 31.2 (2010): 455-461.

Binding Pocket Discovery

DeepChem has some utilities to help find binding pockets on proteins automatically. For now, these utilities are simple, but we will improve these in future versions of DeepChem.

class deepchem.dock.binding_pocket.BindingPocketFinder[source]

Abstract superclass for binding pocket detectors

Many times when working with a new protein or other macromolecule, it’s not clear what zones of the macromolecule may be good targets for potential ligands or other molecules to interact with. This abstract class provides a template for child classes that algorithmically locate potential binding pockets that are good potential interaction sites.

Note that potential interactions sites can be found by many different methods, and that this abstract class doesn’t specify the technique to be used.

__init__

Initialize self. See help(type(self)) for accurate signature.

find_pockets(molecule: Any)[source]

Finds potential binding pockets in proteins.

Parameters:molecule (object) – Some representation of a molecule.
class deepchem.dock.binding_pocket.ConvexHullPocketFinder(scoring_model: Optional[deepchem.models.models.Model] = None, pad: float = 5.0)[source]

Implementation that uses convex hull of protein to find pockets.

Based on https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112621/pdf/1472-6807-14-18.pdf

__init__(scoring_model: Optional[deepchem.models.models.Model] = None, pad: float = 5.0)[source]

Initialize the pocket finder.

Parameters:
  • scoring_model (Model, optional (default None)) – If specified, use this model to prune pockets.
  • pad (float, optional (default 5.0)) – The number of angstroms to pad around a binding pocket’s atoms to get a binding pocket box.
find_all_pockets(protein_file: str) → List[deepchem.utils.coordinate_box_utils.CoordinateBox][source]

Find list of binding pockets on protein.

Parameters:protein_file (str) – Protein to load in.
Returns:List of binding pockets on protein. Each pocket is a CoordinateBox
Return type:List[CoordinateBox]
find_pockets(macromolecule_file: str) → List[deepchem.utils.coordinate_box_utils.CoordinateBox][source]

Find list of suitable binding pockets on protein.

This function computes putative binding pockets on this protein. This class uses the ConvexHull to compute binding pockets. Each face of the hull is converted into a coordinate box used for binding.

Parameters:macromolecule_file (str) – Location of the macromolecule file to load
Returns:List of pockets. Each pocket is a CoordinateBox
Return type:List[CoordinateBox]

Pose Generation

Pose generation is the task of finding a “pose”, that is a geometric configuration of a small molecule interacting with a protein. Pose generation is a complex process, so for now DeepChem relies on external software to perform pose generation. This software is invoked and installed under the hood.

class deepchem.dock.pose_generation.PoseGenerator[source]

A Pose Generator computes low energy conformations for molecular complexes.

Many questions in structural biophysics reduce to that of computing the binding free energy of molecular complexes. A key step towards computing the binding free energy of two complexes is to find low energy “poses”, that is energetically favorable conformations of molecules with respect to each other. One application of this technique is to find low energy poses for protein-ligand interactions.

__init__

Initialize self. See help(type(self)) for accurate signature.

generate_poses(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, generate_scores: bool = False)[source]

Generates a list of low energy poses for molecular complex

Parameters:
  • molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
  • centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
  • box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
  • exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.
  • num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.
  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
  • out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
  • generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.
Returns:

Return type:

A list of molecular complexes in energetically favorable poses.

class deepchem.dock.pose_generation.VinaPoseGenerator(sixty_four_bits: bool = True, pocket_finder: Optional[deepchem.dock.binding_pocket.BindingPocketFinder] = None)[source]

Uses Autodock Vina to generate binding poses.

This class uses Autodock Vina to make make predictions of binding poses. It downloads the Autodock Vina executable for your system to your specified DEEPCHEM_DATA_DIR (remember this is an environment variable you set) and invokes the executable to perform pose generation for you.

Note

This class requires RDKit to be installed.

__init__(sixty_four_bits: bool = True, pocket_finder: Optional[deepchem.dock.binding_pocket.BindingPocketFinder] = None)[source]

Initializes Vina Pose Generator

Parameters:
  • sixty_four_bits (bool, optional (default True)) – Specifies whether this is a 64-bit machine. Needed to download the correct executable.
  • pocket_finder (BindingPocketFinder, optional (default None)) – If specified should be an instance of dc.dock.BindingPocketFinder.
generate_poses(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, generate_scores: bool = False) → Union[Tuple[List[Tuple[Any, Any]], List[float]], List[Tuple[Any, Any]]][source]

Generates the docked complex and outputs files for docked complex.

TODO: How can this work on Windows? We need to install a .msi file and invoke it correctly from Python for this to work.

Parameters:
  • molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
  • centroid (np.ndarray, optional) – The centroid to dock against. Is computed if not specified.
  • box_dims (np.ndarray, optional) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
  • exhaustiveness (int, optional (default 10)) – Tells Autodock Vina how exhaustive it should be with pose generation.
  • num_modes (int, optional (default 9)) – Tells Autodock Vina how many binding modes it should generate at each invocation.
  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
  • out_dir (str, optional) – If specified, write generated poses to this directory.
  • generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.
Returns:

Tuple of (docked_poses, scores) or docked_poses. docked_poses is a list of docked molecular complexes. Each entry in this list contains a (protein_mol, ligand_mol) pair of RDKit molecules. scores is a list of binding free energies predicted by Vina.

Return type:

Tuple[docked_poses, scores] or docked_poses

Raises:

ValueError if num_pockets is set but self.pocket_finder is None.

Docking

The dc.dock.docking module provides a generic docking implementation that depends on provide pose generation and pose scoring utilities to perform docking. This implementation is generic.

class deepchem.dock.docking.Docker(pose_generator: deepchem.dock.pose_generation.PoseGenerator, featurizer: Optional[deepchem.feat.base_classes.ComplexFeaturizer] = None, scoring_model: Optional[deepchem.models.models.Model] = None)[source]

A generic molecular docking class

This class provides a docking engine which uses provided models for featurization, pose generation, and scoring. Most pieces of docking software are command line tools that are invoked from the shell. The goal of this class is to provide a python clean API for invoking molecular docking programmatically.

The implementation of this class is lightweight and generic. It’s expected that the majority of the heavy lifting will be done by pose generation and scoring classes that are provided to this class.

__init__(pose_generator: deepchem.dock.pose_generation.PoseGenerator, featurizer: Optional[deepchem.feat.base_classes.ComplexFeaturizer] = None, scoring_model: Optional[deepchem.models.models.Model] = None)[source]

Builds model.

Parameters:
  • pose_generator (PoseGenerator) – The pose generator to use for this model
  • featurizer (ComplexFeaturizer, optional (default None)) – Featurizer associated with scoring_model
  • scoring_model (Model, optional (default None)) – Should make predictions on molecular complex.
dock(molecular_complex: Tuple[str, str], centroid: Optional[numpy.ndarray] = None, box_dims: Optional[numpy.ndarray] = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: Optional[int] = None, out_dir: Optional[str] = None, use_pose_generator_scores: bool = False) → Union[Generator[Tuple[Any, Any], None, None], Generator[Tuple[Tuple[Any, Any], float], None, None]][source]

Generic docking function.

This docking function uses this object’s featurizer, pose generator, and scoring model to make docking predictions. This function is written in generic style so

Parameters:
  • molecular_complex (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).
  • centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
  • box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.
  • exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.
  • num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.
  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.
  • out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
  • use_pose_generator_scores (bool, optional (default False)) – If True, ask pose generator to generate scores. This cannot be True if self.featurizer and self.scoring_model are set since those will be used to generate scores in that case.
Returns:

A generator. If use_pose_generator_scores==True or self.scoring_model is set, then will yield tuples (posed_complex, score). Else will yield posed_complex.

Return type:

Generator[Tuple[posed_complex, score]] or Generator[posed_complex]

Pose Scoring

This module contains some utilities for computing docking scoring functions directly in Python. For now, support for custom pose scoring is limited.

deepchem.dock.pose_scoring.pairwise_distances(coords1: numpy.ndarray, coords2: numpy.ndarray) → numpy.ndarray[source]

Returns matrix of pairwise Euclidean distances.

Parameters:
  • coords1 (np.ndarray) – A numpy array of shape (N, 3)
  • coords2 (np.ndarray) – A numpy array of shape (M, 3)
Returns:

A (N,M) array with pairwise distances.

Return type:

np.ndarray

deepchem.dock.pose_scoring.cutoff_filter(d: numpy.ndarray, x: numpy.ndarray, cutoff=8.0) → numpy.ndarray[source]

Applies a cutoff filter on pairwise distances

Parameters:
  • d (np.ndarray) – Pairwise distances matrix. A numpy array of shape (N, M)
  • x (np.ndarray) – Matrix of shape (N, M)
  • cutoff (float, optional (default 8)) – Cutoff for selection in Angstroms
Returns:

A (N,M) array with values where distance is too large thresholded to 0.

Return type:

np.ndarray

deepchem.dock.pose_scoring.vina_nonlinearity(c: numpy.ndarray, w: float, Nrot: int) → numpy.ndarray[source]

Computes non-linearity used in Vina.

Parameters:
  • c (np.ndarray) – A numpy array of shape (N, M)
  • w (float) – Weighting term
  • Nrot (int) – Number of rotatable bonds in this molecule
Returns:

A (N, M) array with activations under a nonlinearity.

Return type:

np.ndarray

deepchem.dock.pose_scoring.vina_repulsion(d: numpy.ndarray) → numpy.ndarray[source]

Computes Autodock Vina’s repulsion interaction term.

Parameters:d (np.ndarray) – A numpy array of shape (N, M).
Returns:A (N, M) array with repulsion terms.
Return type:np.ndarray
deepchem.dock.pose_scoring.vina_hydrophobic(d: numpy.ndarray) → numpy.ndarray[source]

Computes Autodock Vina’s hydrophobic interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:d (np.ndarray) – A numpy array of shape (N, M).
Returns:A (N, M) array of hydrophoboic interactions in a piecewise linear curve.
Return type:np.ndarray

References

[1]Jain, Ajay N. “Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computer-aided molecular design 10.5 (1996): 427-440.
deepchem.dock.pose_scoring.vina_hbond(d: numpy.ndarray) → numpy.ndarray[source]

Computes Autodock Vina’s hydrogen bond interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:d (np.ndarray) – A numpy array of shape (N, M).
Returns:A (N, M) array of hydrophoboic interactions in a piecewise linear curve.
Return type:np.ndarray

References

[1]Jain, Ajay N. “Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computer-aided molecular design 10.5 (1996): 427-440.
deepchem.dock.pose_scoring.vina_gaussian_first(d: numpy.ndarray) → numpy.ndarray[source]

Computes Autodock Vina’s first Gaussian interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:d (np.ndarray) – A numpy array of shape (N, M).
Returns:A (N, M) array of gaussian interaction terms.
Return type:np.ndarray

References

[1]Jain, Ajay N. “Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computer-aided molecular design 10.5 (1996): 427-440.
deepchem.dock.pose_scoring.vina_gaussian_second(d: numpy.ndarray) → numpy.ndarray[source]

Computes Autodock Vina’s second Gaussian interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:d (np.ndarray) – A numpy array of shape (N, M).
Returns:A (N, M) array of gaussian interaction terms.
Return type:np.ndarray

References

[1]Jain, Ajay N. “Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.” Journal of computer-aided molecular design 10.5 (1996): 427-440.
deepchem.dock.pose_scoring.vina_energy_term(coords1: numpy.ndarray, coords2: numpy.ndarray, weights: numpy.ndarray, wrot: float, Nrot: int) → numpy.ndarray[source]

Computes the Vina Energy function for two molecular conformations

Parameters:
  • coords1 (np.ndarray) – Molecular coordinates of shape (N, 3)
  • coords2 (np.ndarray) – Molecular coordinates of shape (M, 3)
  • weights (np.ndarray) – A numpy array of shape (5,). The 5 values are weights for repulsion interaction term, hydrophobic interaction term, hydrogen bond interaction term, first Gaussian interaction term and second Gaussian interaction term.
  • wrot (float) – The scaling factor for nonlinearity
  • Nrot (int) – Number of rotatable bonds in this calculation
Returns:

A scalar value with free energy

Return type:

np.ndarray