Docking

Thanks to advances in biophysics, we are often able to find the structure of proteins from experimental techniques like Cryo-EM or X-ray crystallography. These structures can be powerful aides in designing small molecules. The technique of Molecular docking performs geometric calculations to find a “binding pose” with the small molecule interacting with the protein in question in a suitable binding pocket (that is, a region on the protein which has a groove in which the small molecule can rest). For more information about docking, check out the Autodock Vina paper:

Trott, Oleg, and Arthur J. Olson. “AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.” Journal of computational chemistry 31.2 (2010): 455-461.

Binding Pocket Discovery

DeepChem has some utilities to help find binding pockets on proteins automatically. For now, these utilities are simple, but we will improve these in future versions of DeepChem.

class BindingPocketFinder[source]

Abstract superclass for binding pocket detectors

Many times when working with a new protein or other macromolecule, it’s not clear what zones of the macromolecule may be good targets for potential ligands or other molecules to interact with. This abstract class provides a template for child classes that algorithmically locate potential binding pockets that are good potential interaction sites.

Note that potential interactions sites can be found by many different methods, and that this abstract class doesn’t specify the technique to be used.

find_pockets(molecule: Any)[source]

Finds potential binding pockets in proteins.

Parameters:

molecule (object) – Some representation of a molecule.

class ConvexHullPocketFinder(scoring_model: Model | None = None, pad: float = 5.0)[source]

Implementation that uses convex hull of protein to find pockets.

Based on https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112621/pdf/1472-6807-14-18.pdf

__init__(scoring_model: Model | None = None, pad: float = 5.0)[source]

Initialize the pocket finder.

Parameters:
  • scoring_model (Model, optional (default None)) – If specified, use this model to prune pockets.

  • pad (float, optional (default 5.0)) – The number of angstroms to pad around a binding pocket’s atoms to get a binding pocket box.

find_all_pockets(protein_file: str) List[CoordinateBox][source]

Find list of binding pockets on protein.

Parameters:

protein_file (str) – Protein to load in.

Returns:

List of binding pockets on protein. Each pocket is a CoordinateBox

Return type:

List[CoordinateBox]

find_pockets(macromolecule_file: str) List[CoordinateBox][source]

Find list of suitable binding pockets on protein.

This function computes putative binding pockets on this protein. This class uses the ConvexHull to compute binding pockets. Each face of the hull is converted into a coordinate box used for binding.

Parameters:

macromolecule_file (str) – Location of the macromolecule file to load

Returns:

List of pockets. Each pocket is a CoordinateBox

Return type:

List[CoordinateBox]

Pose Generation

Pose generation is the task of finding a “pose”, that is a geometric configuration of a small molecule interacting with a protein. Pose generation is a complex process, so for now DeepChem relies on external software to perform pose generation. This software is invoked and installed under the hood.

class PoseGenerator[source]

A Pose Generator computes low energy conformations for molecular complexes.

Many questions in structural biophysics reduce to that of computing the binding free energy of molecular complexes. A key step towards computing the binding free energy of two complexes is to find low energy “poses”, that is energetically favorable conformations of molecules with respect to each other. One application of this technique is to find low energy poses for protein-ligand interactions.

generate_poses(molecular_complex: Tuple[str, str], centroid: ndarray | None = None, box_dims: ndarray | None = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: int | None = None, out_dir: str | None = None, generate_scores: bool = False)[source]

Generates a list of low energy poses for molecular complex

Parameters:
  • molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).

  • centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.

  • box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.

  • exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.

  • num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.

  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.

  • out_dir (str, optional (default None)) – If specified, write generated poses to this directory.

  • generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.

Return type:

A list of molecular complexes in energetically favorable poses.

class VinaPoseGenerator(pocket_finder: BindingPocketFinder | None = None)[source]

Uses Autodock Vina to generate binding poses.

This class uses Autodock Vina to make make predictions of binding poses.

Example

>> import deepchem as dc >> vpg = dc.dock.VinaPoseGenerator(pocket_finder=None) >> protein_file = ‘1jld_protein.pdb’ >> ligand_file = ‘1jld_ligand.sdf’ >> poses, scores = vpg.generate_poses( .. (protein_file, ligand_file), .. exhaustiveness=1, .. num_modes=1, .. out_dir=tmp, .. generate_scores=True)

Note

This class requires RDKit and vina to be installed. As on 9-March-22, Vina is not available on Windows. Hence, this utility is currently available only on Ubuntu and MacOS.

__init__(pocket_finder: BindingPocketFinder | None = None)[source]

Initializes Vina Pose Generator

Parameters:

pocket_finder (BindingPocketFinder, optional (default None)) – If specified should be an instance of dc.dock.BindingPocketFinder.

generate_poses(molecular_complex: Tuple[str, str], centroid: ndarray | None = None, box_dims: ndarray | None = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: int | None = None, out_dir: str | None = None, generate_scores: bool | None = False, **kwargs) Tuple[List[Tuple[Any, Any]], List[float]] | List[Tuple[Any, Any]][source]

Generates the docked complex and outputs files for docked complex.

Parameters:
  • molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file). The protein should be a pdb file and the ligand should be an sdf file.

  • centroid (np.ndarray, optional) – The centroid to dock against. Is computed if not specified.

  • box_dims (np.ndarray, optional) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.

  • exhaustiveness (int, optional (default 10)) – Tells Autodock Vina how exhaustive it should be with pose generation. A higher value of exhaustiveness implies more computation effort for the docking experiment.

  • num_modes (int, optional (default 9)) – Tells Autodock Vina how many binding modes it should generate at each invocation.

  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.

  • out_dir (str, optional) – If specified, write generated poses to this directory.

  • generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.

  • kwargs – The kwargs - cpu, min_rmsd, max_evals, energy_range supported by VINA are as documented in https://autodock-vina.readthedocs.io/en/latest/vina.html

Returns:

Tuple of (docked_poses, scores) or docked_poses. docked_poses is a list of docked molecular complexes. Each entry in this list contains a (protein_mol, ligand_mol) pair of RDKit molecules. scores is a list of binding free energies predicted by Vina.

Return type:

Tuple[docked_poses, scores] or docked_poses

Raises:

ValueError

class GninaPoseGenerator[source]

Use GNINA to generate binding poses.

This class uses GNINA (a deep learning framework for molecular docking) to generate binding poses. It downloads the GNINA executable to DEEPCHEM_DATA_DIR (an environment variable you set) and invokes the executable to perform pose generation.

GNINA uses pre-trained convolutional neural network (CNN) scoring functions to rank binding poses based on learned representations of 3D protein-ligand interactions. It has been shown to outperform AutoDock Vina in virtual screening applications [1]_.

If you use the GNINA molecular docking engine, please cite the relevant papers: https://github.com/gnina/gnina#citation The primary citation for GNINA is [1]_.

References

“Protein–Ligand Scoring with Convolutional Neural Networks.” Journal of chemical information and modeling (2017).

Note

  • GNINA currently only works on Linux operating systems.

  • GNINA requires CUDA >= 10.1 for fast CNN scoring.

  • Almost all dependencies are included in the most compatible way

    possible, which reduces performance. Build GNINA from source for production use.

__init__()[source]

Initialize GNINA pose generator.

generate_poses(molecular_complex: Tuple[str, str], centroid: ndarray | None = None, box_dims: ndarray | None = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: int | None = None, out_dir: str | None = None, generate_scores: bool = True, **kwargs) Tuple[List[Tuple[Any, Any]], ndarray] | List[Tuple[Any, Any]][source]

Generates the docked complex and outputs files for docked complex.

Parameters:
  • molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).

  • centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.

  • box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 4 angstroms.

  • exhaustiveness (int (default 8)) – Tells GNINA how exhaustive it should be with pose generation.

  • num_modes (int (default 9)) – Tells GNINA how many binding modes it should generate at each invocation.

  • out_dir (str, optional) – If specified, write generated poses to this directory.

  • generate_scores (bool, optional (default True)) – If True, the pose generator will return scores for complexes. This is used typically when invoking external docking programs that compute scores.

  • kwargs – Any args supported by GNINA as documented https://github.com/gnina/gnina#usage

Returns:

Tuple of (docked_poses, scores) or docked_poses. docked_poses is a list of docked molecular complexes. Each entry in this list contains a (protein_mol, ligand_mol) pair of RDKit molecules. scores is an array of binding affinities (kcal/mol), CNN pose scores, and CNN affinities predicted by GNINA.

Return type:

Tuple[docked_poses, scores] or docked_poses

Docking

The dc.dock.docking module provides a generic docking implementation that depends on provide pose generation and pose scoring utilities to perform docking. This implementation is generic.

class Docker(pose_generator: PoseGenerator, featurizer: ComplexFeaturizer | None = None, scoring_model: Model | None = None)[source]

A generic molecular docking class

This class provides a docking engine which uses provided models for featurization, pose generation, and scoring. Most pieces of docking software are command line tools that are invoked from the shell. The goal of this class is to provide a python clean API for invoking molecular docking programmatically.

The implementation of this class is lightweight and generic. It’s expected that the majority of the heavy lifting will be done by pose generation and scoring classes that are provided to this class.

__init__(pose_generator: PoseGenerator, featurizer: ComplexFeaturizer | None = None, scoring_model: Model | None = None)[source]

Builds model.

Parameters:
  • pose_generator (PoseGenerator) – The pose generator to use for this model

  • featurizer (ComplexFeaturizer, optional (default None)) – Featurizer associated with scoring_model

  • scoring_model (Model, optional (default None)) – Should make predictions on molecular complex.

dock(molecular_complex: Tuple[str, str], centroid: ndarray | None = None, box_dims: ndarray | None = None, exhaustiveness: int = 10, num_modes: int = 9, num_pockets: int | None = None, out_dir: str | None = None, use_pose_generator_scores: bool = False) Generator[Tuple[Any, Any], None, None] | Generator[Tuple[Tuple[Any, Any], float], None, None][source]

Generic docking function.

This docking function uses this object’s featurizer, pose generator, and scoring model to make docking predictions. This function is written in generic style so

Parameters:
  • molecular_complex (Tuple[str, str]) – A representation of a molecular complex. This tuple is (protein_file, ligand_file).

  • centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.

  • box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not specified is set to size of molecular complex plus 5 angstroms.

  • exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose generation.

  • num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at each invocation.

  • num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only generate poses for the first num_pockets returned by self.pocket_finder.

  • out_dir (str, optional (default None)) – If specified, write generated poses to this directory.

  • use_pose_generator_scores (bool, optional (default False)) – If True, ask pose generator to generate scores. This cannot be True if self.featurizer and self.scoring_model are set since those will be used to generate scores in that case.

Returns:

A generator. If use_pose_generator_scores==True or self.scoring_model is set, then will yield tuples (posed_complex, score). Else will yield posed_complex.

Return type:

Generator[Tuple[posed_complex, score]] or Generator[posed_complex]

Pose Scoring

This module contains some utilities for computing docking scoring functions directly in Python. For now, support for custom pose scoring is limited.

pairwise_distances(coords1: ndarray, coords2: ndarray) ndarray[source]

Returns matrix of pairwise Euclidean distances.

Parameters:
  • coords1 (np.ndarray) – A numpy array of shape (N, 3)

  • coords2 (np.ndarray) – A numpy array of shape (M, 3)

Returns:

A (N,M) array with pairwise distances.

Return type:

np.ndarray

cutoff_filter(d: ndarray, x: ndarray, cutoff=8.0) ndarray[source]

Applies a cutoff filter on pairwise distances

Parameters:
  • d (np.ndarray) – Pairwise distances matrix. A numpy array of shape (N, M)

  • x (np.ndarray) – Matrix of shape (N, M)

  • cutoff (float, optional (default 8)) – Cutoff for selection in Angstroms

Returns:

A (N,M) array with values where distance is too large thresholded to 0.

Return type:

np.ndarray

vina_nonlinearity(c: ndarray, w: float, Nrot: int) ndarray[source]

Computes non-linearity used in Vina.

Parameters:
  • c (np.ndarray) – A numpy array of shape (N, M)

  • w (float) – Weighting term

  • Nrot (int) – Number of rotatable bonds in this molecule

Returns:

A (N, M) array with activations under a nonlinearity.

Return type:

np.ndarray

vina_repulsion(d: ndarray) ndarray[source]

Computes Autodock Vina’s repulsion interaction term.

Parameters:

d (np.ndarray) – A numpy array of shape (N, M).

Returns:

A (N, M) array with repulsion terms.

Return type:

np.ndarray

vina_hydrophobic(d: ndarray) ndarray[source]

Computes Autodock Vina’s hydrophobic interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:

d (np.ndarray) – A numpy array of shape (N, M).

Returns:

A (N, M) array of hydrophoboic interactions in a piecewise linear curve.

Return type:

np.ndarray

References

vina_hbond(d: ndarray) ndarray[source]

Computes Autodock Vina’s hydrogen bond interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:

d (np.ndarray) – A numpy array of shape (N, M).

Returns:

A (N, M) array of hydrophoboic interactions in a piecewise linear curve.

Return type:

np.ndarray

References

vina_gaussian_first(d: ndarray) ndarray[source]

Computes Autodock Vina’s first Gaussian interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:

d (np.ndarray) – A numpy array of shape (N, M).

Returns:

A (N, M) array of gaussian interaction terms.

Return type:

np.ndarray

References

vina_gaussian_second(d: ndarray) ndarray[source]

Computes Autodock Vina’s second Gaussian interaction term.

Here, d is the set of surface distances as defined in [1]_

Parameters:

d (np.ndarray) – A numpy array of shape (N, M).

Returns:

A (N, M) array of gaussian interaction terms.

Return type:

np.ndarray

References

vina_energy_term(coords1: ndarray, coords2: ndarray, weights: ndarray, wrot: float, Nrot: int) ndarray[source]

Computes the Vina Energy function for two molecular conformations

Parameters:
  • coords1 (np.ndarray) – Molecular coordinates of shape (N, 3)

  • coords2 (np.ndarray) – Molecular coordinates of shape (M, 3)

  • weights (np.ndarray) – A numpy array of shape (5,). The 5 values are weights for repulsion interaction term, hydrophobic interaction term, hydrogen bond interaction term, first Gaussian interaction term and second Gaussian interaction term.

  • wrot (float) – The scaling factor for nonlinearity

  • Nrot (int) – Number of rotatable bonds in this calculation

Returns:

A scalar value with free energy

Return type:

np.ndarray