Utilities

DeepChem has a broad collection of utility functions. Many of these maybe be of independent interest to users since they deal with some tricky aspects of processing scientific datatypes.

Data Utilities

Array Utilities

pad_array(x: ndarray, shape: Tuple | int, fill: float = 0.0, both: bool = False) ndarray[source]

Pad an array with a fill value.

Parameters:
  • x (np.ndarray) – A numpy array.

  • shape (Tuple or int) – Desired shape. If int, all dimensions are padded to that size.

  • fill (float, optional (default 0.0)) – The padded value.

  • both (bool, optional (default False)) – If True, split the padding on both sides of each axis. If False, padding is applied to the end of each axis.

Returns:

A padded numpy array

Return type:

np.ndarray

Data Directory

The DeepChem data directory is where downloaded MoleculeNet datasets are stored.

get_data_dir() str[source]

Get the DeepChem data directory.

Returns:

The default path to store DeepChem data. If you want to change this path, please set your own path to DEEPCHEM_DATA_DIR as an environment variable.

Return type:

str

URL Handling

download_url(url: str, dest_dir: str = '/tmp', name: str | None = None)[source]

Download a file to disk.

Parameters:
  • url (str) – The URL to download from

  • dest_dir (str) – The directory to save the file in

  • name (str) – The file name to save it as. If omitted, it will try to extract a file name from the URL

File Handling

untargz_file(file: str, dest_dir: str = '/tmp', name: str | None = None)[source]

Untar and unzip a .tar.gz file to disk.

Parameters:
  • file (str) – The filepath to decompress

  • dest_dir (str) – The directory to save the file in

  • name (str) – The file name to save it as. If omitted, it will use the file name

unzip_file(file: str, dest_dir: str = '/tmp', name: str | None = None)[source]

Unzip a .zip file to disk.

Parameters:
  • file (str) – The filepath to decompress

  • dest_dir (str) – The directory to save the file in

  • name (str) – The directory name to unzip it to. If omitted, it will use the file name

load_data(input_files: List[str], shard_size: int | None = None) Iterator[Any][source]

Loads data from files.

Parameters:
  • input_files (List[str]) – List of filenames.

  • shard_size (int, default None) – Size of shard to yield

Returns:

Iterator which iterates over provided files.

Return type:

Iterator[Any]

Notes

The supported file types are SDF, CSV and Pickle.

load_sdf_files(input_files: List[str], clean_mols: bool = True, tasks: List[str] = [], shard_size: int | None = None) Iterator[DataFrame][source]

Load SDF file into dataframe.

Parameters:
  • input_files (List[str]) – List of filenames

  • clean_mols (bool, default True) – Whether to sanitize molecules.

  • tasks (List[str], default []) – Each entry in tasks is treated as a property in the SDF file and is retrieved with mol.GetProp(str(task)) where mol is the RDKit mol loaded from a given SDF entry.

  • shard_size (int, default None) – The shard size to yield at one time.

Returns:

Generator which yields the dataframe which is the same shard size.

Return type:

Iterator[pd.DataFrame]

Notes

This function requires RDKit to be installed.

load_csv_files(input_files: List[str], shard_size: int | None = None) Iterator[DataFrame][source]

Load data as pandas dataframe from CSV files.

Parameters:
  • input_files (List[str]) – List of filenames

  • shard_size (int, default None) – The shard size to yield at one time.

Returns:

Generator which yields the dataframe which is the same shard size.

Return type:

Iterator[pd.DataFrame]

load_json_files(input_files: List[str], shard_size: int | None = None) Iterator[DataFrame][source]

Load data as pandas dataframe.

Parameters:
  • input_files (List[str]) – List of json filenames.

  • shard_size (int, default None) – Chunksize for reading json files.

Returns:

Generator which yields the dataframe which is the same shard size.

Return type:

Iterator[pd.DataFrame]

Notes

To load shards from a json file into a Pandas dataframe, the file must be originally saved with df.to_json('filename.json', orient='records', lines=True)

load_pickle_files(input_files: List[str]) Iterator[Any][source]

Load dataset from pickle files.

Parameters:

input_files (List[str]) – The list of filenames of pickle file. This function can load from gzipped pickle file like XXXX.pkl.gz.

Returns:

Generator which yields the objects which is loaded from each pickle file.

Return type:

Iterator[Any]

load_from_disk(filename: str) Any[source]

Load a dataset from file.

Parameters:

filename (str) – A filename you want to load data.

Returns:

A loaded object from file.

Return type:

Any

save_to_disk(dataset: Any, filename: str, compress: int = 3)[source]

Save a dataset to file.

Parameters:
  • dataset (str) – A data saved

  • filename (str) – Path to save data.

  • compress (int, default 3) – The compress option when dumping joblib file.

load_dataset_from_disk(save_dir: str) Tuple[bool, Tuple[DiskDataset, DiskDataset, DiskDataset] | None, List[Transformer]][source]

Loads MoleculeNet train/valid/test/transformers from disk.

Expects that data was saved using save_dataset_to_disk below. Expects the following directory structure for save_dir: save_dir/


—> train_dir/ | —> valid_dir/ | —> test_dir/ | —> transformers.pkl

Parameters:

save_dir (str) – Directory name to load datasets.

Returns:

  • loaded (bool) – Whether the load succeeded

  • all_dataset (Tuple[DiskDataset, DiskDataset, DiskDataset]) – The train, valid, test datasets

  • transformers (Transformer) – The transformers used for this dataset

save_dataset_to_disk(save_dir: str, train: DiskDataset, valid: DiskDataset, test: DiskDataset, transformers: List[Transformer])[source]

Utility used by MoleculeNet to save train/valid/test datasets.

This utility function saves a train/valid/test split of a dataset along with transformers in the same directory. The saved datasets will take the following structure: save_dir/


—> train_dir/ | —> valid_dir/ | —> test_dir/ | —> transformers.pkl

Parameters:
  • save_dir (str) – Directory name to save datasets to.

  • train (DiskDataset) – Training dataset to save.

  • valid (DiskDataset) – Validation dataset to save.

  • test (DiskDataset) – Test dataset to save.

  • transformers (List[Transformer]) – List of transformers to save to disk.

Molecular Utilities

class ConformerGenerator(max_conformers: int = 1, rmsd_threshold: float = 0.5, force_field: str = 'uff', pool_multiplier: int = 10)[source]

Generate molecule conformers.

Notes

Procedure 1. Generate a pool of conformers. 2. Minimize conformers. 3. Prune conformers using an RMSD threshold.

Note that pruning is done _after_ minimization, which differs from the protocol described in the references [1]_ [2]_.

References

Notes

This class requires RDKit to be installed.

__init__(max_conformers: int = 1, rmsd_threshold: float = 0.5, force_field: str = 'uff', pool_multiplier: int = 10)[source]
Parameters:
  • max_conformers (int, optional (default 1)) – Maximum number of conformers to generate (after pruning).

  • rmsd_threshold (float, optional (default 0.5)) – RMSD threshold for pruning conformers. If None or negative, no pruning is performed.

  • force_field (str, optional (default 'uff')) – Force field to use for conformer energy calculation and minimization. Options are ‘uff’, ‘mmff94’, and ‘mmff94s’.

  • pool_multiplier (int, optional (default 10)) – Factor to multiply by max_conformers to generate the initial conformer pool. Since conformers are pruned after energy minimization, increasing the size of the pool increases the chance of identifying max_conformers unique conformers.

generate_conformers(mol: Any) Any[source]

Generate conformers for a molecule.

This function returns a copy of the original molecule with embedded conformers.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object

Returns:

mol – A new RDKit Mol object containing the chosen conformers, sorted by increasing energy.

Return type:

rdkit.Chem.rdchem.Mol

embed_molecule(mol: Any) Any[source]

Generate conformers, possibly with pruning.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object

Returns:

mol – RDKit Mol object with embedded multiple conformers.

Return type:

rdkit.Chem.rdchem.Mol

get_molecule_force_field(mol: Any, conf_id: int | None = None, **kwargs) Any[source]

Get a force field for a molecule.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object with embedded conformers.

  • conf_id (int, optional) – ID of the conformer to associate with the force field.

  • kwargs (dict, optional) – Keyword arguments for force field constructor.

Returns:

ff – RDKit force field instance for a molecule.

Return type:

rdkit.ForceField.rdForceField.ForceField

minimize_conformers(mol: Any) None[source]

Minimize molecule conformers.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object with embedded conformers.

get_conformer_energies(mol: Any) ndarray[source]

Calculate conformer energies.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object with embedded conformers.

Returns:

energies – Minimized conformer energies.

Return type:

np.ndarray

prune_conformers(mol: Any) Any[source]

Prune conformers from a molecule using an RMSD threshold, starting with the lowest energy conformer.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object

Returns:

new_mol – A new rdkit.Chem.rdchem.Mol containing the chosen conformers, sorted by increasing energy.

Return type:

rdkit.Chem.rdchem.Mol

static get_conformer_rmsd(mol: Any) ndarray[source]

Calculate conformer-conformer RMSD.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit Mol object

Returns:

rmsd – A conformer-conformer RMSD value. The shape is (NumConformers, NumConformers)

Return type:

np.ndarray

class MoleculeLoadException(*args, **kwargs)[source]
__init__(*args, **kwargs)[source]
get_xyz_from_mol(mol)[source]

Extracts a numpy array of coordinates from a molecules.

Returns a (N, 3) numpy array of 3d coords of given rdkit molecule

Parameters:

mol (rdkit Molecule) – Molecule to extract coordinates for

Return type:

Numpy ndarray of shape (N, 3) where N = mol.GetNumAtoms().

add_hydrogens_to_mol(mol, is_protein=False)[source]

Add hydrogens to a molecule object

Parameters:
  • mol (Rdkit Mol) – Molecule to hydrogenate

  • is_protein (bool, optional (default False)) – Whether this molecule is a protein.

Return type:

Rdkit Mol

Note

This function requires RDKit and PDBFixer to be installed.

compute_charges(mol)[source]

Attempt to compute Gasteiger Charges on Mol

This also has the side effect of calculating charges on mol. The mol passed into this function has to already have been sanitized

Parameters:

mol (rdkit molecule) –

Return type:

No return since updates in place.

Note

This function requires RDKit to be installed.

load_molecule(molecule_file, add_hydrogens=True, calc_charges=True, sanitize=True, is_protein=False)[source]

Converts molecule file to (xyz-coords, obmol object)

Given molecule_file, returns a tuple of xyz coords of molecule and an rdkit object representing that molecule in that order (xyz, rdkit_mol). This ordering convention is used in the code in a few places.

Parameters:
  • molecule_file (str) – filename for molecule

  • add_hydrogens (bool, optional (default True)) – If True, add hydrogens via pdbfixer

  • calc_charges (bool, optional (default True)) – If True, add charges via rdkit

  • sanitize (bool, optional (default False)) – If True, sanitize molecules via rdkit

  • is_protein (bool, optional (default False)) – If True`, this molecule is loaded as a protein. This flag will affect some of the cleanup procedures applied.

Returns:

  • Tuple (xyz, mol) if file contains single molecule. Else returns a

  • list of the tuples for the separate molecules in this list.

Note

This function requires RDKit to be installed.

write_molecule(mol, outfile, is_protein=False)[source]

Write molecule to a file

This function writes a representation of the provided molecule to the specified outfile. Doesn’t return anything.

Parameters:
  • mol (rdkit Mol) – Molecule to write

  • outfile (str) – Filename to write mol to

  • is_protein (bool, optional) – Is this molecule a protein?

Note

This function requires RDKit to be installed.

Raises:

ValueError – if outfile isn’t of a supported format.:

Molecular Fragment Utilities

It’s often convenient to manipulate subsets of a molecule. The MolecularFragment class aids in such manipulations.

class MolecularFragment(atoms: Sequence[Any], coords: ndarray)[source]

A class that represents a fragment of a molecule.

It’s often convenient to represent a fragment of a molecule. For example, if two molecules form a molecular complex, it may be useful to create two fragments which represent the subsets of each molecule that’s close to the other molecule (in the contact region).

Ideally, we’d be able to do this in RDKit direct, but manipulating molecular fragments doesn’t seem to be supported functionality.

Examples

>>> import numpy as np
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("C")
>>> coords = np.array([[0.0, 0.0, 0.0]])
>>> atom = mol.GetAtoms()[0]
>>> fragment = MolecularFragment([atom], coords)
__init__(atoms: Sequence[Any], coords: ndarray)[source]

Initialize this object.

Parameters:
  • atoms (Iterable[rdkit.Chem.rdchem.Atom]) – Each entry in this list should be a RDKit Atom.

  • coords (np.ndarray) – Array of locations for atoms of shape (N, 3) where N == len(atoms).

GetAtoms() List[AtomShim][source]

Returns the list of atoms

Returns:

list of atoms in this fragment.

Return type:

List[AtomShim]

GetNumAtoms() int[source]

Returns the number of atoms

Returns:

Number of atoms in this fragment.

Return type:

int

GetCoords() ndarray[source]

Returns 3D coordinates for this fragment as numpy array.

Returns:

A numpy array of shape (N, 3) with coordinates for this fragment. Here, N is the number of atoms.

Return type:

np.ndarray

class AtomShim(atomic_num: int, partial_charge: float, atom_coords: ndarray)[source]

This is a shim object wrapping an atom.

We use this class instead of raw RDKit atoms since manipulating a large number of rdkit Atoms seems to result in segfaults. Wrapping the basic information in an AtomShim seems to avoid issues.

__init__(atomic_num: int, partial_charge: float, atom_coords: ndarray)[source]

Initialize this object

Parameters:
  • atomic_num (int) – Atomic number for this atom.

  • partial_charge (float) – The partial Gasteiger charge for this atom

  • atom_coords (np.ndarray) – Of shape (3,) with the coordinates of this atom

GetAtomicNum() int[source]

Returns atomic number for this atom.

Returns:

Atomic number for this atom.

Return type:

int

GetPartialCharge() float[source]

Returns partial charge for this atom.

Returns:

A partial Gasteiger charge for this atom.

Return type:

float

GetCoords() ndarray[source]

Returns 3D coordinates for this atom as numpy array.

Returns:

Numpy array of shape (3,) with coordinates for this atom.

Return type:

np.ndarray

strip_hydrogens(coords: ndarray, mol: Any | MolecularFragment) Tuple[ndarray, MolecularFragment][source]

Strip the hydrogens from input molecule

Parameters:
  • coords (np.ndarray) – The coords must be of shape (N, 3) and correspond to coordinates of mol.

  • mol (rdkit.Chem.rdchem.Mol or MolecularFragment) – The molecule to strip

Returns:

A tuple of (coords, mol_frag) where coords is a numpy array of coordinates with hydrogen coordinates. mol_frag is a MolecularFragment.

Return type:

Tuple[np.ndarray, MolecularFragment]

Notes

This function requires RDKit to be installed.

merge_molecular_fragments(molecules: List[MolecularFragment]) MolecularFragment | None[source]

Helper method to merge two molecular fragments.

Parameters:

molecules (List[MolecularFragment]) – List of MolecularFragment objects.

Returns:

Returns a merged MolecularFragment

Return type:

Optional[MolecularFragment]

get_contact_atom_indices(fragments: List[Tuple[ndarray, Any]], cutoff: float = 4.5) List[List[int]][source]

Compute that atoms close to contact region.

Molecular complexes can get very large. This can make it unwieldy to compute functions on them. To improve memory usage, it can be very useful to trim out atoms that aren’t close to contact regions. This function computes pairwise distances between all pairs of molecules in the molecular complex. If an atom is within cutoff distance of any atom on another molecule in the complex, it is regarded as a contact atom. Otherwise it is trimmed.

Parameters:
  • fragments (List[Tuple[np.ndarray, rdkit.Chem.rdchem.Mol]]) – As returned by rdkit_utils.load_complex, a list of tuples of (coords, mol) where coords is a (N_atoms, 3) array and mol is the rdkit molecule object.

  • cutoff (float, optional (default 4.5)) – The cutoff distance in angstroms.

Returns:

A list of length len(molecular_complex). Each entry in this list is a list of atom indices from that molecule which should be kept, in sorted order.

Return type:

List[List[int]]

reduce_molecular_complex_to_contacts(fragments: List[Tuple[ndarray, Any]], cutoff: float = 4.5) List[Tuple[ndarray, MolecularFragment]][source]

Reduce a molecular complex to only those atoms near a contact.

Molecular complexes can get very large. This can make it unwieldy to compute functions on them. To improve memory usage, it can be very useful to trim out atoms that aren’t close to contact regions. This function takes in a molecular complex and returns a new molecular complex representation that contains only contact atoms. The contact atoms are computed by calling get_contact_atom_indices under the hood.

Parameters:
  • fragments (List[Tuple[np.ndarray, rdkit.Chem.rdchem.Mol]]) – As returned by rdkit_utils.load_complex, a list of tuples of (coords, mol) where coords is a (N_atoms, 3) array and mol is the rdkit molecule object.

  • cutoff (float) – The cutoff distance in angstroms.

Returns:

A list of length len(molecular_complex). Each entry in this list is a tuple of (coords, MolecularFragment). The coords is stripped down to (N_contact_atoms, 3) where N_contact_atoms is the number of contact atoms for this complex. MolecularFragment is used since it’s tricky to make a RDKit sub-molecule.

Return type:

List[Tuple[np.ndarray, MolecularFragment]]

Coordinate Box Utilities

class CoordinateBox(x_range: Tuple[float, float], y_range: Tuple[float, float], z_range: Tuple[float, float])[source]

A coordinate box that represents a block in space.

Molecular complexes are typically represented with atoms as coordinate points. Each complex is naturally associated with a number of different box regions. For example, the bounding box is a box that contains all atoms in the molecular complex. A binding pocket box is a box that focuses in on a binding region of a protein to a ligand. A interface box is the region in which two proteins have a bulk interaction.

The CoordinateBox class is designed to represent such regions of space. It consists of the coordinates of the box, and the collection of atoms that live in this box alongside their coordinates.

__init__(x_range: Tuple[float, float], y_range: Tuple[float, float], z_range: Tuple[float, float])[source]

Initialize this box.

Parameters:
  • x_range (Tuple[float, float]) – A tuple of (x_min, x_max) with max and min x-coordinates.

  • y_range (Tuple[float, float]) – A tuple of (y_min, y_max) with max and min y-coordinates.

  • z_range (Tuple[float, float]) – A tuple of (z_min, z_max) with max and min z-coordinates.

Raises:

ValueError

__contains__(point: Sequence[float]) bool[source]

Check whether a point is in this box.

Parameters:

point (Sequence[float]) – 3-tuple or list of length 3 or np.ndarray of shape (3,). The (x, y, z) coordinates of a point in space.

Returns:

True if other is contained in this box.

Return type:

bool

center() Tuple[float, float, float][source]

Computes the center of this box.

Returns:

(x, y, z) the coordinates of the center of the box.

Return type:

Tuple[float, float, float]

Examples

>>> box = CoordinateBox((0, 1), (0, 1), (0, 1))
>>> box.center()
(0.5, 0.5, 0.5)
volume() float[source]

Computes and returns the volume of this box.

Returns:

The volume of this box. Can be 0 if box is empty

Return type:

float

Examples

>>> box = CoordinateBox((0, 1), (0, 1), (0, 1))
>>> box.volume()
1
contains(other: CoordinateBox) bool[source]

Test whether this box contains another.

This method checks whether other is contained in this box.

Parameters:

other (CoordinateBox) – The box to check is contained in this box.

Returns:

True if other is contained in this box.

Return type:

bool

Raises:

ValueError

intersect_interval(interval1: Tuple[float, float], interval2: Tuple[float, float]) Tuple[float, float][source]

Computes the intersection of two intervals.

Parameters:
  • interval1 (Tuple[float, float]) – Should be (x1_min, x1_max)

  • interval2 (Tuple[float, float]) – Should be (x2_min, x2_max)

Returns:

x_intersect – Should be the intersection. If the intersection is empty returns (0, 0) to represent the empty set. Otherwise is (max(x1_min, x2_min), min(x1_max, x2_max)).

Return type:

Tuple[float, float]

union(box1: CoordinateBox, box2: CoordinateBox) CoordinateBox[source]

Merges provided boxes to find the smallest union box.

This method merges the two provided boxes.

Parameters:
Returns:

Smallest CoordinateBox that contains both box1 and box2

Return type:

CoordinateBox

merge_overlapping_boxes(boxes: List[CoordinateBox], threshold: float = 0.8) List[CoordinateBox][source]

Merge boxes which have an overlap greater than threshold.

Parameters:
  • boxes (list[CoordinateBox]) – A list of CoordinateBox objects.

  • threshold (float, default 0.8) – The volume fraction of the boxes that must overlap for them to be merged together.

Returns:

List[CoordinateBox] of merged boxes. This list will have length less than or equal to the length of boxes.

Return type:

List[CoordinateBox]

get_face_boxes(coords: ndarray, pad: float = 5.0) List[CoordinateBox][source]

For each face of the convex hull, compute a coordinate box around it.

The convex hull of a macromolecule will have a series of triangular faces. For each such triangular face, we construct a bounding box around this triangle. Think of this box as attempting to capture some binding interaction region whose exterior is controlled by the box. Note that this box will likely be a crude approximation, but the advantage of this technique is that it only uses simple geometry to provide some basic biological insight into the molecule at hand.

The pad parameter is used to control the amount of padding around the face to be used for the coordinate box.

Parameters:
  • coords (np.ndarray) – A numpy array of shape (N, 3). The coordinates of a molecule.

  • pad (float, optional (default 5.0)) – The number of angstroms to pad.

Returns:

boxes – List of CoordinateBox

Return type:

List[CoordinateBox]

Examples

>>> coords = np.array([[0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> boxes = get_face_boxes(coords, pad=5)

Evaluation Utils

class Evaluator(model, dataset: Dataset, transformers: List[Transformer])[source]

Class that evaluates a model on a given dataset.

The evaluator class is used to evaluate a dc.models.Model class on a given dc.data.Dataset object. The evaluator is aware of dc.trans.Transformer objects so will automatically undo any transformations which have been applied.

Examples

Evaluators allow for a model to be evaluated directly on a Metric for sklearn. Let’s do a bit of setup constructing our dataset and model.

>>> import deepchem as dc
>>> import numpy as np
>>> X = np.random.rand(10, 5)
>>> y = np.random.rand(10, 1)
>>> dataset = dc.data.NumpyDataset(X, y)
>>> model = dc.models.MultitaskRegressor(1, 5)
>>> transformers = []

Then you can evaluate this model as follows >>> import sklearn >>> evaluator = Evaluator(model, dataset, transformers) >>> multitask_scores = evaluator.compute_model_performance( … sklearn.metrics.mean_absolute_error)

Evaluators can also be used with dc.metrics.Metric objects as well in case you want to customize your metric further.

>>> evaluator = Evaluator(model, dataset, transformers)
>>> metric = dc.metrics.Metric(dc.metrics.mae_score)
>>> multitask_scores = evaluator.compute_model_performance(metric)
__init__(model, dataset: Dataset, transformers: List[Transformer])[source]

Initialize this evaluator

Parameters:
  • model (Model) – Model to evaluate. Note that this must be a regression or classification model and not a generative model.

  • dataset (Dataset) – Dataset object to evaluate model on.

  • transformers (List[Transformer]) – List of dc.trans.Transformer objects. These transformations must have been applied to dataset previously. The dataset will be untransformed for metric evaluation.

output_statistics(scores: Dict[str, float], stats_out: str)[source]

Write computed stats to file.

Parameters:
  • scores (dict) – Dictionary mapping names of metrics to scores.

  • stats_out (str) – Name of file to write scores to.

output_predictions(y_preds: ndarray, csv_out: str)[source]

Writes predictions to file.

Writes predictions made on self.dataset to a specified file on disk. self.dataset.ids are used to format predictions.

Parameters:
  • y_preds (np.ndarray) – Predictions to output

  • csv_out (str) – Name of file to write predictions to.

compute_model_performance(metrics: Metric | Callable[[...], Any] | List[Metric] | List[Callable[[...], Any]], csv_out: str | None = None, stats_out: str | None = None, per_task_metrics: bool = False, use_sample_weights: bool = False, n_classes: int = 2) Dict[str, float] | Tuple[Dict[str, float], Dict[str, float]][source]

Computes statistics of model on test data and saves results to csv.

Parameters:
  • metrics (dc.metrics.Metric/list[dc.metrics.Metric]/function) – The set of metrics provided. This class attempts to do some intelligent handling of input. If a single dc.metrics.Metric object is provided or a list is provided, it will evaluate self.model on these metrics. If a function is provided, it is assumed to be a metric function that this method will attempt to wrap in a dc.metrics.Metric object. A metric function must accept two arguments, y_true, y_pred both of which are np.ndarray objects and return a floating point score. The metric function may also accept a keyword argument sample_weight to account for per-sample weights.

  • csv_out (str, optional (DEPRECATED)) – Filename to write CSV of model predictions.

  • stats_out (str, optional (DEPRECATED)) – Filename to write computed statistics.

  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.

  • use_sample_weights (bool, optional (default False)) – If set, use per-sample weights w.

  • n_classes (int, optional (default None)) – If specified, will use n_classes as the number of unique classes in self.dataset. Note that this argument will be ignored for regression metrics.

Returns:

  • multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.

  • all_task_scores (dict, optional) – If per_task_metrics == True, then returns a second dictionary of scores for each task separately.

class GeneratorEvaluator(model, generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer], labels: List | None = None, weights: List | None = None)[source]

Evaluate models on a stream of data.

This class is a partner class to Evaluator. Instead of operating over datasets this class operates over a generator which yields batches of data to feed into provided model.

Examples

>>> import deepchem as dc
>>> import numpy as np
>>> X = np.random.rand(10, 5)
>>> y = np.random.rand(10, 1)
>>> dataset = dc.data.NumpyDataset(X, y)
>>> model = dc.models.MultitaskRegressor(1, 5)
>>> generator = model.default_generator(dataset, pad_batches=False)
>>> transformers = []

Then you can evaluate this model as follows

>>> import sklearn
>>> evaluator = GeneratorEvaluator(model, generator, transformers)
>>> multitask_scores = evaluator.compute_model_performance(
...     sklearn.metrics.mean_absolute_error)

Evaluators can also be used with dc.metrics.Metric objects as well in case you want to customize your metric further. (Note that a given generator can only be used once so we have to redefine the generator here.)

>>> generator = model.default_generator(dataset, pad_batches=False)
>>> evaluator = GeneratorEvaluator(model, generator, transformers)
>>> metric = dc.metrics.Metric(dc.metrics.mae_score)
>>> multitask_scores = evaluator.compute_model_performance(metric)
__init__(model, generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer], labels: List | None = None, weights: List | None = None)[source]
Parameters:
  • model (Model) – Model to evaluate.

  • generator (generator) – Generator which yields batches to feed into the model. For a KerasModel, it should be a tuple of the form (inputs, labels, weights). The “correct” way to create this generator is to use model.default_generator as shown in the example above.

  • transformers (List[Transformer]) – Tranformers to “undo” when applied to the models outputs

  • labels (list of Layer) – layers which are keys in the generator to compare to outputs

  • weights (list of Layer) – layers which are keys in the generator for weight matrices

compute_model_performance(metrics: Metric | Callable[[...], Any] | List[Metric] | List[Callable[[...], Any]], per_task_metrics: bool = False, use_sample_weights: bool = False, n_classes: int = 2) Dict[str, float] | Tuple[Dict[str, float], Dict[str, float]][source]

Computes statistics of model on test data and saves results to csv.

Parameters:
  • metrics (dc.metrics.Metric/list[dc.metrics.Metric]/function) – The set of metrics provided. This class attempts to do some intelligent handling of input. If a single dc.metrics.Metric object is provided or a list is provided, it will evaluate self.model on these metrics. If a function is provided, it is assumed to be a metric function that this method will attempt to wrap in a dc.metrics.Metric object. A metric function must accept two arguments, y_true, y_pred both of which are np.ndarray objects and return a floating point score.

  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.

  • use_sample_weights (bool, optional (default False)) – If set, use per-sample weights w.

  • n_classes (int, optional (default None)) – If specified, will assume that all metrics are classification metrics and will use n_classes as the number of unique classes in self.dataset.

Returns:

  • multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.

  • all_task_scores (dict, optional) – If per_task_metrics == True, then returns a second dictionary of scores for each task separately.

relative_difference(x: ndarray, y: ndarray) ndarray[source]

Compute the relative difference between x and y

The two argument arrays must have the same shape.

Parameters:
  • x (np.ndarray) – First input array

  • y (np.ndarray) – Second input array

Returns:

z – We will have z == np.abs(x-y) / np.abs(max(x, y)).

Return type:

np.ndarray

Genomic Utilities

seq_one_hot_encode(sequences, letters: str = 'ATCGN') ndarray[source]

One hot encodes list of genomic sequences.

Sequences encoded have shape (N_sequences, N_letters, sequence_length, 1). These sequences will be processed as images with one color channel.

Parameters:
  • sequences (np.ndarray or Iterator[Bio.SeqRecord]) – Iterable object of genetic sequences

  • letters (str, optional (default "ATCGN")) – String with the set of possible letters in the sequences.

Raises:

ValueError: – If sequences are of different lengths.

Returns:

A numpy array of shape (N_sequences, N_letters, sequence_length, 1).

Return type:

np.ndarray

encode_bio_sequence(fname: str, file_type: str = 'fasta', letters: str = 'ATCGN') ndarray[source]

Loads a sequence file and returns an array of one-hot sequences.

Parameters:
  • fname (str) – Filename of fasta file.

  • file_type (str, optional (default "fasta")) – The type of file encoding to process, e.g. fasta or fastq, this is passed to Biopython.SeqIO.parse.

  • letters (str, optional (default "ATCGN")) – The set of letters that the sequences consist of, e.g. ATCG.

Returns:

A numpy array of shape (N_sequences, N_letters, sequence_length, 1).

Return type:

np.ndarray

Notes

This function requires BioPython to be installed.

hhblits(dataset_path, database=None, data_dir=None, evalue=0.001, num_iterations=2, num_threads=4)[source]

Run hhblits multisequence alignment search on a dataset. This function requires the hhblits binary to be installed and in the path. This function also requires a Hidden Markov Model reference database to be provided. Both can be found here: https://github.com/soedinglab/hh-suite

The database should be in the deepchem data directory or specified as an argument. To set the deepchem data directory, run this command in your environment:

export DEEPCHEM_DATA_DIR=<path to data directory>

Parameters:
  • dataset_path (str) – Path to single sequence or multiple sequence alignment (MSA) dataset. Results will be saved in this directory.

  • database (str) – Name of database to search against. Note this is not the path, but the name of the database.

  • data_dir (str) – Path to database directory.

  • evalue (float) – E-value cutoff.

  • num_iterations (int) – Number of iterations.

  • num_threads (int) – Number of threads.

Returns:

  • results (.a3m file) – MSA file containing the results of the hhblits search.

  • results (.hhr file) – hhsuite results file containing the results of the hhblits search.

Examples

>>> from deepchem.utils.sequence_utils import hhblits
>>> msa_path = hhblits('test/data/example.fasta', database='example_db', data_dir='test/data/', evalue=0.001, num_iterations=2, num_threads=4)
hhsearch(dataset_path, database=None, data_dir=None, evalue=0.001, num_iterations=2, num_threads=4)[source]

Run hhsearch multisequence alignment search on a dataset. This function requires the hhblits binary to be installed and in the path. This function also requires a Hidden Markov Model reference database to be provided. Both can be found here: https://github.com/soedinglab/hh-suite

The database should be in the deepchem data directory or specified as an argument. To set the deepchem data directory, run this command in your environment:

export DEEPCHEM_DATA_DIR=<path to data directory>

Examples

>>> from deepchem.utils.sequence_utils import hhsearch
>>> msa_path = hhsearch('test/data/example.fasta', database='example_db', data_dir='test/data/', evalue=0.001, num_iterations=2, num_threads=4)
Parameters:
  • dataset_path (str) – Path to multiple sequence alignment dataset. Results will be saved in this directory.

  • database (str) – Name of database to search against. Note this is not the path, but the name of the database.

  • data_dir (str) – Path to database directory.

  • evalue (float) – E-value cutoff.

  • num_iterations (int) – Number of iterations.

  • num_threads (int) – Number of threads.

Returns:

  • results (.a3m file) – MSA file containing the results of the hhblits search.

  • results (.hhr file) – hhsuite results file containing the results of the hhblits search.

MSA_to_dataset(msa_path)[source]

Convert a multiple sequence alignment to a NumpyDataset object.

Geometry Utilities

unit_vector(vector: ndarray) ndarray[source]

Returns the unit vector of the vector.

Parameters:

vector (np.ndarray) – A numpy array of shape (3,), where 3 is (x,y,z).

Returns:

A numpy array of shape (3,). The unit vector of the input vector.

Return type:

np.ndarray

angle_between(vector_i: ndarray, vector_j: ndarray) float[source]

Returns the angle in radians between vectors “vector_i” and “vector_j”

Note that this function always returns the smaller of the two angles between the vectors (value between 0 and pi).

Parameters:
  • vector_i (np.ndarray) – A numpy array of shape (3,), where 3 is (x,y,z).

  • vector_j (np.ndarray) – A numpy array of shape (3,), where 3 is (x,y,z).

Returns:

The angle in radians between the two vectors.

Return type:

np.ndarray

Examples

>>> print("%0.06f" % angle_between((1, 0, 0), (0, 1, 0)))
1.570796
>>> print("%0.06f" % angle_between((1, 0, 0), (1, 0, 0)))
0.000000
>>> print("%0.06f" % angle_between((1, 0, 0), (-1, 0, 0)))
3.141593
generate_random_unit_vector() ndarray[source]

Generate a random unit vector on the sphere S^2.

Citation: http://mathworld.wolfram.com/SpherePointPicking.html

Pseudocode:
  1. Choose random theta element [0, 2*pi]

  2. Choose random z element [-1, 1]

  3. Compute output vector u: (x,y,z) = (sqrt(1-z^2)*cos(theta), sqrt(1-z^2)*sin(theta),z)

Returns:

u – A numpy array of shape (3,). u is an unit vector

Return type:

np.ndarray

generate_random_rotation_matrix() ndarray[source]

Generates a random rotation matrix.

  1. Generate a random unit vector u, randomly sampled from the

unit sphere (see function generate_random_unit_vector() for details)

  1. Generate a second random unit vector v

  1. If absolute value of u dot v > 0.99, repeat.

(This is important for numerical stability. Intuition: we want them to be as linearly independent as possible or else the orthogonalized version of v will be much shorter in magnitude compared to u. I assume in Stack they took this from Gram-Schmidt orthogonalization?)

  1. v” = v - (u dot v)*u, i.e. subtract out the component of

v that’s in u’s direction

  1. normalize v” (this isn”t in Stack but I assume it must be

done)

  1. find w = u cross v”

  2. u, v”, and w will form the columns of a rotation matrix, R.

The intuition is that u, v” and w are, respectively, what the standard basis vectors e1, e2, and e3 will be mapped to under the transformation.

Returns:

R – A numpy array of shape (3, 3). R is a rotation matrix.

Return type:

np.ndarray

is_angle_within_cutoff(vector_i: ndarray, vector_j: ndarray, angle_cutoff: float) bool[source]

A utility function to compute whether two vectors are within a cutoff from 180 degrees apart.

Parameters:
  • vector_i (np.ndarray) – A numpy array of shape (3,)`, where 3 is (x,y,z).

  • vector_j (np.ndarray) – A numpy array of shape (3,), where 3 is (x,y,z).

  • cutoff (float) – The deviation from 180 (in degrees)

Returns:

Whether two vectors are within a cutoff from 180 degrees apart

Return type:

bool

Graph Utilities

fourier_encode_dist(x, num_encodings=4, include_self=True)[source]

Fourier encode the input tensor x based on the specified number of encodings.

This function applies a Fourier encoding to the input tensor x by dividing it by a range of scales (2^i for i in range(num_encodings)) and then concatenating the sine and cosine of the scaled values. Optionally, the original input tensor can be included in the output.

Parameters:
  • x (torch.Tensor) – Input tensor to be Fourier encoded.

  • num_encodings (int, optional, default=4) – Number of Fourier encodings to apply.

  • include_self (bool, optional, default=True) – Whether to include the original input tensor in the output.

Returns:

Fourier encoded tensor.

Return type:

torch.Tensor

Examples

>>> import torch
>>> x = torch.tensor([1.0, 2.0, 3.0])
>>> encoded_x = fourier_encode_dist(x, num_encodings=4, include_self=True)
aggregate_mean(h, **kwargs)[source]

Compute the mean of the input tensor along the second to last dimension.

Parameters:

h (torch.Tensor) – Input tensor.

Returns:

Mean of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_max(h, **kwargs)[source]

Compute the max of the input tensor along the second to last dimension.

Parameters:

h (torch.Tensor) – Input tensor.

Returns:

Max of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_min(h, **kwargs)[source]

Compute the min of the input tensor along the second to last dimension.

Parameters:
  • h (torch.Tensor) – Input tensor.

  • **kwargs – Additional keyword arguments.

Returns:

Min of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_std(h, **kwargs)[source]

Compute the standard deviation of the input tensor along the second to last dimension.

Parameters:

h (torch.Tensor) – Input tensor.

Returns:

Standard deviation of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_var(h, **kwargs)[source]

Compute the variance of the input tensor along the second to last dimension.

Parameters:

h (torch.Tensor) – Input tensor.

Returns:

Variance of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_moment(h, n=3, **kwargs)[source]

Compute the nth moment of the input tensor along the second to last dimension.

Parameters:
  • h (torch.Tensor) – Input tensor.

  • n (int, optional, default=3) – The order of the moment to compute.

Returns:

Nth moment of the input tensor along the second to last dimension.

Return type:

torch.Tensor

aggregate_sum(h, **kwargs)[source]

Compute the sum of the input tensor along the second to last dimension.

Parameters:

h (torch.Tensor) – Input tensor.

Returns:

Sum of the input tensor along the second to last dimension.

Return type:

torch.Tensor

scale_identity(h, D=None, avg_d=None)[source]

Identity scaling function.

Parameters:
  • h (torch.Tensor) – Input tensor.

  • D (torch.Tensor, optional) – Degree tensor.

  • avg_d (dict, optional) – Dictionary containing averages over the training set.

Returns:

Scaled input tensor.

Return type:

torch.Tensor

scale_amplification(h, D, avg_d)[source]

Amplification scaling function. log(D + 1) / d * h where d is the average of the log(D + 1) in the training set

Parameters:
  • h (torch.Tensor) – Input tensor.

  • D (torch.Tensor) – Degree tensor.

  • avg_d (dict) – Dictionary containing averages over the training set.

Returns:

Scaled input tensor.

Return type:

torch.Tensor

scale_attenuation(h, D, avg_d)[source]

Attenuation scaling function. (log(D + 1))^-1 / d * X where d is the average of the log(D + 1))^-1 in the training set

Parameters:
  • h (torch.Tensor) – Input tensor.

  • D (torch.Tensor) – Degree tensor.

  • avg_d (dict) – Dictionary containing averages over the training set.

Returns:

Scaled input tensor.

Return type:

torch.Tensor

Hash Function Utilities

hash_ecfp(ecfp: str, size: int = 1024) int[source]

Returns an int < size representing given ECFP fragment.

Input must be a string. This utility function is used for various ECFP based fingerprints.

Parameters:
  • ecfp (str) – String to hash. Usually an ECFP fragment.

  • size (int, optional (default 1024)) – Hash to an int in range [0, size)

Returns:

ecfp_hash – An int < size representing given ECFP fragment

Return type:

int

hash_ecfp_pair(ecfp_pair: Tuple[str, str], size: int = 1024) int[source]

Returns an int < size representing that ECFP pair.

Input must be a tuple of strings. This utility is primarily used for spatial contact featurizers. For example, if a protein and ligand have close contact region, the first string could be the protein’s fragment and the second the ligand’s fragment. The pair could be hashed together to achieve one hash value for this contact region.

Parameters:
  • ecfp_pair (Tuple[str, str]) – Pair of ECFP fragment strings

  • size (int, optional (default 1024)) – Hash to an int in range [0, size)

Returns:

ecfp_hash – An int < size representing given ECFP pair.

Return type:

int

vectorize(hash_function: Callable[[Any, int], int], feature_dict: Dict[int, str] | None = None, size: int = 1024, feature_list: List | None = None) ndarray[source]

Helper function to vectorize a spatial description from a hash.

Hash functions are used to perform spatial featurizations in DeepChem. However, it’s necessary to convert backwards from the hash function to feature vectors. This function aids in this conversion procedure. It creates a vector of zeros of length size. It then loops through feature_dict, uses hash_function to hash the stored value to an integer in range [0, size) and bumps that index.

Parameters:
  • hash_function (Function, Callable[[str, int], int]) – Should accept two arguments, feature, and size and return a hashed integer. Here feature is the item to hash, and size is an int. For example, if size=1024, then hashed values must fall in range [0, 1024).

  • feature_dict (Dict, optional (default None)) – Maps unique keys to features computed.

  • size (int (default 1024)) – Length of generated bit vector

  • feature_list (List, optional (default None)) – List of features.

Returns:

feature_vector – A numpy array of shape (size,)

Return type:

np.ndarray

Voxel Utils

convert_atom_to_voxel(coordinates: ndarray, atom_index: int, box_width: float, voxel_width: float) ndarray[source]

Converts atom coordinates to an i,j,k grid index.

This function offsets molecular atom coordinates by (box_width/2, box_width/2, box_width/2) and then divides by voxel_width to compute the voxel indices.

Parameters:
  • coordinates (np.ndarray) – Array with coordinates of all atoms in the molecule, shape (N, 3).

  • atom_index (int) – Index of an atom in the molecule.

  • box_width (float) – Size of the box in Angstroms.

  • voxel_width (float) – Size of a voxel in Angstroms

Returns:

indices – A 1D numpy array of length 3 with [i, j, k], the voxel coordinates of specified atom.

Return type:

np.ndarray

convert_atom_pair_to_voxel(coordinates_tuple: Tuple[ndarray, ndarray], atom_index_pair: Tuple[int, int], box_width: float, voxel_width: float) ndarray[source]

Converts a pair of atoms to i,j,k grid indexes.

Parameters:
  • coordinates_tuple (Tuple[np.ndarray, np.ndarray]) – A tuple containing two molecular coordinate arrays of shapes (N, 3) and (M, 3).

  • atom_index_pair (Tuple[int, int]) – A tuple of indices for the atoms in the two molecules.

  • box_width (float) – Size of the box in Angstroms.

  • voxel_width (float) – Size of a voxel in Angstroms

Returns:

indices_list – A numpy array of shape (2, 3), where 3 is [i, j, k] of the voxel coordinates of specified atom.

Return type:

np.ndarray

voxelize(get_voxels: Callable[[...], Any], coordinates: Any, box_width: float = 16.0, voxel_width: float = 1.0, hash_function: Callable[[...], Any] | None = None, feature_dict: Dict[Any, Any] | None = None, feature_list: List[int | Tuple[int]] | None = None, nb_channel: int = 16, dtype: str = 'int') ndarray[source]

Helper function to voxelize inputs.

This helper function helps convert a hash function which specifies spatial features of a molecular complex into a voxel tensor. This utility is used by various featurizers that generate voxel grids.

Parameters:
  • get_voxels (Function) – Function that voxelizes inputs

  • coordinates (Any) – Contains the 3D coordinates of a molecular system. This should have whatever type get_voxels() expects as its first argument.

  • box_width (float, optional (default 16.0)) – Size of a box in which voxel features are calculated. Box is centered on a ligand centroid.

  • voxel_width (float, optional (default 1.0)) – Size of a 3D voxel in a grid in Angstroms.

  • hash_function (Function) – Used to map feature choices to voxel channels.

  • feature_dict (Dict, optional (default None)) – Keys are atom indices or tuples of atom indices, the values are computed features. If hash_function is not None, then the values are hashed using the hash function into [0, nb_channels) and this channel at the voxel for the given key is incremented by 1 for each dictionary entry. If hash_function is None, then the value must be a vector of size (n_channels,) which is added to the existing channel values at that voxel grid.

  • feature_list (List, optional (default None)) – List of atom indices or tuples of atom indices. This can only be used if nb_channel==1. Increments the voxels corresponding to these indices by 1 for each entry.

  • nb_channel (int, , optional (default 16)) – The number of feature channels computed per voxel. Should be a power of 2.

  • dtype (str ('int' or 'float'), optional (default 'int')) – The type of the numpy ndarray created to hold features.

Returns:

feature_tensor – The voxel of the input with the shape (voxels_per_edge, voxels_per_edge, voxels_per_edge, nb_channel).

Return type:

np.ndarray

Graph Convolution Utilities

one_hot_encode(val: int | str, allowable_set: List[str] | List[int], include_unknown_set: bool = False) List[float][source]

One hot encoder for elements of a provided set.

Examples

>>> one_hot_encode("a", ["a", "b", "c"])
[1.0, 0.0, 0.0]
>>> one_hot_encode(2, [0, 1, 2])
[0.0, 0.0, 1.0]
>>> one_hot_encode(3, [0, 1, 2])
[0.0, 0.0, 0.0]
>>> one_hot_encode(3, [0, 1, 2], True)
[0.0, 0.0, 0.0, 1.0]
Parameters:
  • val (int or str) – The value must be present in allowable_set.

  • allowable_set (List[int] or List[str]) – List of allowable quantities.

  • include_unknown_set (bool, default False) – If true, the index of all values not in allowable_set is len(allowable_set).

Returns:

An one-hot vector of val. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

Raises:

ValueError – If include_unknown_set is False and val is not in allowable_set.

get_atom_type_one_hot(atom: Any, allowable_set: List[str] = ['C', 'N', 'O', 'F', 'P', 'S', 'Cl', 'Br', 'I'], include_unknown_set: bool = True) List[float][source]

Get an one-hot feature of an atom type.

Parameters:
  • atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

  • allowable_set (List[str]) – The atom types to consider. The default set is [“C”, “N”, “O”, “F”, “P”, “S”, “Cl”, “Br”, “I”].

  • include_unknown_set (bool, default True) – If true, the index of all atom not in allowable_set is len(allowable_set).

Returns:

An one-hot vector of atom types. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

construct_hydrogen_bonding_info(mol: Any) List[Tuple[int, str]][source]

Construct hydrogen bonding infos about a molecule.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit mol object

Returns:

A list of tuple (atom_index, hydrogen_bonding_type). The hydrogen_bonding_type value is “Acceptor” or “Donor”.

Return type:

List[Tuple[int, str]]

get_atom_hydrogen_bonding_one_hot(atom: Any, hydrogen_bonding: List[Tuple[int, str]]) List[float][source]

Get an one-hot feat about whether an atom accepts electrons or donates electrons.

Parameters:
  • atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

  • hydrogen_bonding (List[Tuple[int, str]]) – The return value of construct_hydrogen_bonding_info. The value is a list of tuple (atom_index, hydrogen_bonding) like (1, “Acceptor”).

Returns:

A one-hot vector of the ring size type. The first element indicates “Donor”, and the second element indicates “Acceptor”.

Return type:

List[float]

get_atom_is_in_aromatic_one_hot(atom: Any) List[float][source]

Get ans one-hot feature about whether an atom is in aromatic system or not.

Parameters:

atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

Returns:

A vector of whether an atom is in aromatic system or not.

Return type:

List[float]

get_atom_hybridization_one_hot(atom: Any, allowable_set: List[str] = ['SP', 'SP2', 'SP3'], include_unknown_set: bool = False) List[float][source]

Get an one-hot feature of hybridization type.

Parameters:
  • atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

  • allowable_set (List[str]) – The hybridization types to consider. The default set is [“SP”, “SP2”, “SP3”]

  • include_unknown_set (bool, default False) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

An one-hot vector of the hybridization type. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

get_atom_total_num_Hs_one_hot(atom: Any, allowable_set: List[int] = [0, 1, 2, 3, 4], include_unknown_set: bool = True) List[float][source]

Get an one-hot feature of the number of hydrogens which an atom has.

Parameters:
  • atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

  • allowable_set (List[int]) – The number of hydrogens to consider. The default set is [0, 1, …, 4]

  • include_unknown_set (bool, default True) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

A one-hot vector of the number of hydrogens which an atom has. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

get_atom_chirality_one_hot(atom: Any) List[float][source]

Get an one-hot feature about an atom chirality type.

Parameters:

atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

Returns:

A one-hot vector of the chirality type. The first element indicates “R”, and the second element indicates “S”.

Return type:

List[float]

get_atom_formal_charge(atom: Any) List[float][source]

Get a formal charge of an atom.

Parameters:

atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

Returns:

A vector of the formal charge.

Return type:

List[float]

get_atom_partial_charge(atom: Any) List[float][source]

Get a partial charge of an atom.

Parameters:

atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

Returns:

A vector of the parital charge.

Return type:

List[float]

Notes

Before using this function, you must calculate GasteigerCharge like AllChem.ComputeGasteigerCharges(mol).

get_atom_total_degree_one_hot(atom: Any, allowable_set: List[int] = [0, 1, 2, 3, 4, 5], include_unknown_set: bool = True) List[float][source]

Get an one-hot feature of the degree which an atom has.

Parameters:
  • atom (rdkit.Chem.rdchem.Atom) – RDKit atom object

  • allowable_set (List[int]) – The degree to consider. The default set is [0, 1, …, 5]

  • include_unknown_set (bool, default True) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

A one-hot vector of the degree which an atom has. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

get_bond_type_one_hot(bond: Any, allowable_set: List[str] = ['SINGLE', 'DOUBLE', 'TRIPLE', 'AROMATIC'], include_unknown_set: bool = False) List[float][source]

Get an one-hot feature of bond type.

Parameters:
  • bond (rdkit.Chem.rdchem.Bond) – RDKit bond object

  • allowable_set (List[str]) – The bond types to consider. The default set is [“SINGLE”, “DOUBLE”, “TRIPLE”, “AROMATIC”].

  • include_unknown_set (bool, default False) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

A one-hot vector of the bond type. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

get_bond_is_in_same_ring_one_hot(bond: Any) List[float][source]

Get an one-hot feature about whether atoms of a bond is in the same ring or not.

Parameters:

bond (rdkit.Chem.rdchem.Bond) – RDKit bond object

Returns:

A one-hot vector of whether a bond is in the same ring or not.

Return type:

List[float]

get_bond_is_conjugated_one_hot(bond: Any) List[float][source]

Get an one-hot feature about whether a bond is conjugated or not.

Parameters:

bond (rdkit.Chem.rdchem.Bond) – RDKit bond object

Returns:

A one-hot vector of whether a bond is conjugated or not.

Return type:

List[float]

get_bond_stereo_one_hot(bond: Any, allowable_set: List[str] = ['STEREONONE', 'STEREOANY', 'STEREOZ', 'STEREOE'], include_unknown_set: bool = True) List[float][source]

Get an one-hot feature of the stereo configuration of a bond.

Parameters:
  • bond (rdkit.Chem.rdchem.Bond) – RDKit bond object

  • allowable_set (List[str]) – The stereo configuration types to consider. The default set is [“STEREONONE”, “STEREOANY”, “STEREOZ”, “STEREOE”].

  • include_unknown_set (bool, default True) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

A one-hot vector of the stereo configuration of a bond. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

get_bond_graph_distance_one_hot(bond: Any, graph_dist_matrix: ndarray, allowable_set: List[int] = [1, 2, 3, 4, 5, 6, 7], include_unknown_set: bool = True) List[float][source]

Get an one-hot feature of graph distance.

Parameters:
  • bond (rdkit.Chem.rdchem.Bond) – RDKit bond object

  • graph_dist_matrix (np.ndarray) – The return value of Chem.GetDistanceMatrix(mol). The shape is (num_atoms, num_atoms).

  • allowable_set (List[int]) – The graph distance types to consider. The default set is [1, 2, …, 7].

  • include_unknown_set (bool, default False) – If true, the index of all types not in allowable_set is len(allowable_set).

Returns:

A one-hot vector of the graph distance. If include_unknown_set is False, the length is len(allowable_set). If include_unknown_set is True, the length is len(allowable_set) + 1.

Return type:

List[float]

Grover Utilities

extract_grover_attributes(molgraph: BatchGraphData)[source]

Utility to extract grover attributes for grover model

Parameters:

molgraph (BatchGraphData) – A batched graph data representing a collection of molecules.

Returns:

graph_attributes – A tuple containing atom features, bond features, atom to bond mapping, bond to atom mapping, bond to reverse bond mapping, atom to atom mapping, atom scope, bond scope, functional group labels and other additional features.

Return type:

Tuple

Example

>>> import deepchem as dc
>>> from deepchem.feat.graph_data import BatchGraphData
>>> smiles = ['CC', 'CCC', 'CC(=O)C']
>>> featurizer = dc.feat.GroverFeaturizer(features_generator=dc.feat.CircularFingerprint())
>>> graphs = featurizer.featurize(smiles)
>>> molgraph = BatchGraphData(graphs)
>>> attributes = extract_grover_attributes(molgraph)

Debug Utilities

Docking Utilities

These utilities assist in file preparation and processing for molecular docking.

write_vina_conf(protein_filename: str, ligand_filename: str, centroid: ndarray, box_dims: ndarray, conf_filename: str, num_modes: int = 9, exhaustiveness: int | None = None) None[source]

Writes Vina configuration file to disk.

Autodock Vina accepts a configuration file which provides options under which Vina is invoked. This utility function writes a vina configuration file which directs Autodock vina to perform docking under the provided options.

Parameters:
  • protein_filename (str) – Filename for protein

  • ligand_filename (str) – Filename for the ligand

  • centroid (np.ndarray) – A numpy array with shape (3,) holding centroid of system

  • box_dims (np.ndarray) – A numpy array of shape (3,) holding the size of the box to dock

  • conf_filename (str) – Filename to write Autodock Vina configuration to.

  • num_modes (int, optional (default 9)) – The number of binding modes Autodock Vina should find

  • exhaustiveness (int, optional) – The exhaustiveness of the search to be performed by Vina

write_gnina_conf(protein_filename: str, ligand_filename: str, conf_filename: str, num_modes: int = 9, exhaustiveness: int | None = None, **kwargs) None[source]

Writes GNINA configuration file to disk.

GNINA accepts a configuration file which provides options under which GNINA is invoked. This utility function writes a configuration file which directs GNINA to perform docking under the provided options.

Parameters:
  • protein_filename (str) – Filename for protein

  • ligand_filename (str) – Filename for the ligand

  • conf_filename (str) – Filename to write Autodock Vina configuration to.

  • num_modes (int, optional (default 9)) – The number of binding modes GNINA should find

  • exhaustiveness (int, optional) – The exhaustiveness of the search to be performed by GNINA

  • kwargs – Args supported by GNINA documented here https://github.com/gnina/gnina#usage

load_docked_ligands(pdbqt_output: str) Tuple[List[Any], List[float]][source]

This function loads ligands docked by autodock vina.

Autodock vina writes outputs to disk in a PDBQT file format. This PDBQT file can contain multiple docked “poses”. Recall that a pose is an energetically favorable 3D conformation of a molecule. This utility function reads and loads the structures for multiple poses from vina’s output file.

Parameters:

pdbqt_output (str) – Should be the filename of a file generated by autodock vina’s docking software.

Returns:

Tuple of molecules, scores. molecules is a list of rdkit molecules with 3D information. scores is the associated vina score.

Return type:

Tuple[List[rdkit.Chem.rdchem.Mol], List[float]]

Notes

This function requires RDKit to be installed.

prepare_inputs(protein: str, ligand: str, replace_nonstandard_residues: bool = True, remove_heterogens: bool = True, remove_water: bool = True, add_hydrogens: bool = True, pH: float = 7.0, optimize_ligand: bool = True, pdb_name: str | None = None) Tuple[Any, Any][source]

This prepares protein-ligand complexes for docking.

Autodock Vina requires PDB files for proteins and ligands with sensible inputs. This function uses PDBFixer and RDKit to ensure that inputs are reasonable and ready for docking. Default values are given for convenience, but fixing PDB files is complicated and human judgement is required to produce protein structures suitable for docking. Always inspect the results carefully before trying to perform docking.

Parameters:
  • protein (str) – Filename for protein PDB file or a PDBID.

  • ligand (str) – Either a filename for a ligand PDB file or a SMILES string.

  • replace_nonstandard_residues (bool (default True)) – Replace nonstandard residues with standard residues.

  • remove_heterogens (bool (default True)) – Removes residues that are not standard amino acids or nucleotides.

  • remove_water (bool (default True)) – Remove water molecules.

  • add_hydrogens (bool (default True)) – Add missing hydrogens at the protonation state given by pH.

  • pH (float (default 7.0)) – Most common form of each residue at given pH value is used.

  • optimize_ligand (bool (default True)) – If True, optimize ligand with RDKit. Required for SMILES inputs.

  • pdb_name (Optional[str]) – If given, write sanitized protein and ligand to files called “pdb_name.pdb” and “ligand_pdb_name.pdb”

Returns:

Tuple of protein_molecule, ligand_molecule with 3D information.

Return type:

Tuple[RDKitMol, RDKitMol]

Note

This function requires RDKit and OpenMM to be installed. Read more about PDBFixer here: https://github.com/openmm/pdbfixer.

Examples

>>> p, m = prepare_inputs('3cyx', 'CCC')

>> p.GetNumAtoms() >> m.GetNumAtoms()

>>> p, m = prepare_inputs('3cyx', 'CCC', remove_heterogens=False)

>> p.GetNumAtoms()

read_gnina_log(log_file: str) ndarray[source]

Read GNINA logfile and get docking scores.

GNINA writes computed binding affinities to a logfile.

Parameters:

log_file (str) – Filename of logfile generated by GNINA.

Returns:

scores – Array of binding affinity (kcal/mol), CNN pose score, and CNN affinity for each binding mode.

Return type:

np.array, dimension (num_modes, 3)

Fake Data Generator

The utilities here are used to generate random sample data which can be used for testing model architectures or other purposes.

class FakeGraphGenerator(min_nodes: int = 10, max_nodes: int = 10, n_node_features: int = 5, avg_degree: int = 4, n_edge_features: int = 3, n_classes: int = 2, task: str = 'graph', **kwargs)[source]

Generates a random graphs which can be used for testing or other purposes.

The generated graph supports both node-level and graph-level labels.

Example

>>> from deepchem.utils.fake_data_generator import FakeGraphGenerator
>>> fgg  = FakeGraphGenerator(min_nodes=8, max_nodes=10,  n_node_features=5, avg_degree=8, n_edge_features=3, n_classes=2, task='graph', z=5)
>>> graphs = fgg.sample(n_graphs=10)
>>> type(graphs)
<class 'deepchem.data.datasets.NumpyDataset'>
>>> type(graphs.X[0])
<class 'deepchem.feat.graph_data.GraphData'>
>>> len(graphs) == 10  # num_graphs
True

Note

The FakeGraphGenerator class is based on torch_geometric.dataset.FakeDataset class.

__init__(min_nodes: int = 10, max_nodes: int = 10, n_node_features: int = 5, avg_degree: int = 4, n_edge_features: int = 3, n_classes: int = 2, task: str = 'graph', **kwargs)[source]
Parameters:
  • min_nodes (int, default 10) – Minimum number of permissible nodes in a graph

  • max_nodes (int, default 10) – Maximum number of permissible nodes in a graph

  • n_node_features (int, default 5) – Average number of node features in a graph

  • avg_degree (int, default 4) – Average degree of the graph (avg_degree should be a positive number greater than the min_nodes)

  • n_edge_features (int, default 3) – Average number of features in the edge

  • task (str, default 'graph') – Indicates node-level labels or graph-level labels

  • kwargs (optional) – Additional graph attributes and their shapes , e.g. global_features = 5

sample(n_graphs: int = 100) NumpyDataset[source]

Samples graphs

Parameters:

n_graphs (int, default 100) – Number of graphs to generate

Returns:

graphs – Generated Graphs

Return type:

NumpyDataset

Electron Sampler

The utilities here are used to sample electrons in a given molecule and update it using monte carlo methods, which can be used for methods like Variational Monte Carlo, etc.

class ElectronSampler(central_value: ndarray, f: Callable[[ndarray], ndarray], batch_no: int = 10, x: ndarray = array([], dtype=float64), steps: int = 200, steps_per_update: int = 10, seed: int | None = None, symmetric: bool = True, simultaneous: bool = True)[source]

This class enables to initialize electron’s position using gauss distribution around a nucleus and update using Markov Chain Monte-Carlo(MCMC) moves.

Using the probability obtained from the square of magnitude of wavefunction of a molecule/atom, MCMC steps can be performed to get the electron’s positions and further update the wavefunction. This method is primarily used in methods like Variational Monte Carlo to sample electrons around the nucleons. Sampling can be done in 2 ways: -Simultaneous: All the electrons’ positions are updated all at once.

-Single-electron: MCMC steps are performed only a particular electron, given their index value.

Further these moves can be done in 2 methods: -Symmetric: In this configuration, the standard deviation for all the steps are uniform.

-Asymmetric: In this configuration, the standard deviation are not uniform and typically the standard deviation is obtained a function like harmonic distances, etc.

Irrespective of these methods, the initialization is done uniformly around the respective nucleus and the number of electrons specified.

Example

>>> from deepchem.utils.electron_sampler import ElectronSampler
>>> test_f = lambda x: 2*np.log(np.random.uniform(low=0,high=1.0,size=np.shape(x)[0]))
>>> distribution=ElectronSampler(central_value=np.array([[1,1,3],[3,2,3]]),f=test_f,seed=0,batch_no=2,steps=1000,)
>>> distribution.gauss_initialize_position(np.array([[1],[2]]))

>> print(distribution.x) [[[[1.03528105 1.00800314 3.01957476]]

[[3.01900177 1.99697286 2.99793562]]

[[3.00821197 2.00288087 3.02908547]]]

[[[1.04481786 1.03735116 2.98045444]]

[[3.01522075 2.0024335 3.00887726]]

[[3.00667349 2.02988158 2.99589683]]]]

>>> distribution.move()
0.5115

>> print(distribution.x) [[[[-0.32441754 1.23330263 2.67927645]]

[[ 3.42250997 2.23617126 3.55806632]]

[[ 3.37491385 1.54374006 3.13575241]]]

[[[ 0.49067726 1.03987841 3.70277884]]

[[ 3.5631939 1.68703947 2.5685874 ]]

[[ 2.84560249 1.73998364 3.41274181]]]]

__init__(central_value: ndarray, f: Callable[[ndarray], ndarray], batch_no: int = 10, x: ndarray = array([], dtype=float64), steps: int = 200, steps_per_update: int = 10, seed: int | None = None, symmetric: bool = True, simultaneous: bool = True)[source]
Parameters:
  • central_value (np.ndarray) – Contains each nucleus’ coordinates in a 2D array. The shape of the array should be(number_of_nucleus,3).Ex: [[1,2,3],[3,4,5],..]

  • f (Callable[[np.ndarray],np.ndarray]) – A function that should give the twice the log probability of wavefunction of the molecular system when called. Should taken in a 4D array of electron’s positions(x) as argument and return a numpy array containing the log probabilities of each batch.

  • batch_no (int, optional (default 10)) – Number of batches of the electron’s positions to be initialized.

  • x (np.ndarray, optional (default np.ndarray([]))) – Contains the electron’s coordinates in a 4D array. The shape of the array should be(batch_no,no_of_electrons,1,3). Can be a 1D empty array, when electron’s positions are yet to be initialized.

  • steps (int, optional (default 10)) – The number of MCMC steps to be performed when the moves are called.

  • steps_per_update (int (default 10)) – The number of steps after which the parameters of the MCMC gets updated.

  • seed (int, optional (default None)) – Random seed to use.

  • symmetric (bool, optional(default True)) – If true, symmetric moves will be used, else asymmetric moves will be followed.

  • simultaneous (bool, optional(default True)) – If true, MCMC steps will be performed on all the electrons, else only a single electron gets updated.

sampled_electrons[source]

Keeps track of the sampled electrons at every step, must be empty at start.

Type:

np.ndarray

harmonic_mean(y: ndarray) ndarray[source]

Calculates the harmonic mean of the value ‘y’ from the self.central value. The numpy array returned is typically scaled up to get the standard deviation matrix.

Parameters:

y (np.ndarray) – Containing the data distribution. Shape of y should be (batch,no_of_electron,1,3)

Returns:

Contains the harmonic mean of the data distribution of each batch. Shape of the array obtained (batch_no, no_of_electrons,1,1)

Return type:

np.ndarray

log_prob_gaussian(y: ndarray, mu: ndarray, sigma: ndarray) ndarray[source]

Calculates the log probability of a gaussian distribution, given the mean and standard deviation

Parameters:
  • y (np.ndarray) – data for which the log normal distribution is to be found

  • mu (np.ndarray) – Means wrt which the log normal is calculated. Same shape as x or should be brodcastable to x

  • sigma (np.ndarray,) – The standard deviation of the log normal distribution. Same shape as x or should be brodcastable to x

Returns:

Log probability of gaussian distribution, with the shape - (batch_no,).

Return type:

np.ndarray

gauss_initialize_position(no_sample: ndarray, stddev: float = 0.02)[source]

Initializes the position around a central value as mean sampled from a gauss distribution and updates self.x. :param no_sample: Contains the number of samples to initialize under each mean. should be in the form [[3],[2]..], where here it means 3 samples and 2 samples around the first entry and second entry,respectively in self.central_value is taken. :type no_sample: np.ndarray, :param stddev: contains the stddev with which the electrons’ coordinates are initialized :type stddev: float, optional (default 0.02)

electron_update(lp1, lp2, move_prob, ratio, x2) ndarray[source]

Performs sampling & parameter updates of electrons and appends the sampled electrons to self.sampled_electrons.

Parameters:
  • lp1 (np.ndarray) – Log probability of initial parameter state.

  • lp2 (np.ndarray) – Log probability of the new sampled state.

  • move_prob (np.ndarray) – Sampled log probabilty of the electron moving from the initial to final state, sampled assymetrically or symetrically.

  • ratio (np.ndarray) – Ratio of lp1 and lp2 state.

  • x2 (np.ndarray) – Numpy array of the new sampled electrons.

Returns:

lp1 – The update log probability of initial parameter state.

Return type:

np.ndarray

move(stddev: float = 0.02, asymmetric_func: Callable[[ndarray], ndarray] | None = None, index: int | None = None) float[source]

Performs Metropolis-Hasting move for self.x(electrons). The type of moves to be followed -(simultaneous or single-electron, symmetric or asymmetric) have been specified when calling the class. The self.x array is replaced with a new array at the end of each step containing the new electron’s positions.

Parameters:
  • asymmetric_func (Callable[[np.ndarray],np.ndarray], optional(default None)) – Should be specified for an asymmetric move.The function should take in only 1 argument- y: a numpy array wrt to which mean should be calculated. This function should return the mean for the asymmetric proposal. For ferminet, this function is the harmonic mean of the distance between the electron and the nucleus.

  • stddev (float, optional (default 0.02)) – Specifies the standard deviation in the case of symmetric moves and the scaling factor of the standard deviation matrix in the case of asymmetric moves.

  • index (int, optional (default None)) – Specifies the index of the electron to be updated in the case of a single electron move.

Returns:

accepted move ratio of the MCMC steps.

Return type:

float

Density Functional Theory Utilities

The utilites here are used to create an object that contains information about a system’s self-consistent iteration steps and other processes.

class Lattice(a: Tensor)[source]

Lattice is an object that describe the periodicity of the lattice. Note that this object does not know about atoms. For the integrated object between the lattice and atoms, please see Sol

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import Lattice
>>> a = torch.tensor([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
>>> lattice = Lattice(a)
>>> lattice.lattice_vectors()
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
>>> lattice.recip_vectors()
tensor([[6.2832, 0.0000, 0.0000],
        [0.0000, 6.2832, 0.0000],
        [0.0000, 0.0000, 6.2832]])
>>> lattice.volume() # volume of the unit cell
tensor(1.)
>>> lattice.get_lattice_ls(1.0) # get the neighboring lattice vectors
tensor([[ 0.,  0., -1.],
        [ 0., -1.,  0.],
        [-1.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]])
>>> lattice.get_gvgrids(6.0) # get the neighboring G-vectors
(tensor([[ 0.0000,  0.0000, -6.2832],
        [ 0.0000, -6.2832,  0.0000],
        [-6.2832,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 6.2832,  0.0000,  0.0000],
        [ 0.0000,  6.2832,  0.0000],
        [ 0.0000,  0.0000,  6.2832]]), tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]))
>>> lattice.estimate_ewald_eta(1e-5) # estimate the ewald's sum eta
1.8
__init__(a: Tensor)[source]

Initialize the lattice object.

2D or 1D repetition are not implemented yet

Parameters:

a (torch.Tensor) – The lattice vectors with shape (ndim, ndim) with ndim == 3

lattice_vectors() Tensor[source]

Returns the 3D lattice vectors (nv, ndim) with nv == 3

recip_vectors() Tensor[source]

Returns the 3D reciprocal vectors with norm == 2 * pi with shape (nv, ndim) with nv == 3

Note: torch.det(self.a) should not be equal to zero.

volume() Tensor[source]

Returns the volume of a lattice.

property params: Tuple[Tensor, ...][source]

Returns the list of parameters of this object

get_lattice_ls(rcut: float, exclude_zeros: bool = False) Tensor[source]

Returns a tensor that contains the coordinates of the neighboring lattices.

Parameters:
  • rcut (float) – The threshold of the distance from the main cell to be included in the neighbor.

  • exclude_zeros (bool (default: False)) – If True, then it will exclude the vector that are all zeros.

Returns:

ls – Tensor with size (nb, ndim) containing the coordinates of the neighboring cells.

Return type:

torch.Tensor

get_gvgrids(gcut: float, exclude_zeros: bool = False) Tuple[Tensor, Tensor][source]

Returns a tensor that contains the coordinate in reciprocal space of the neighboring Brillouin zones.

Parameters:
  • gcut (float) – Cut off for generating the G-points.

  • exclude_zeros (bool (default: False)) – If True, then it will exclude the vector that are all zeros.

Returns:

  • gvgrids (torch.Tensor) – Tensor with size (ng, ndim) containing the G-coordinates of the Brillouin zones.

  • weights (torch.Tensor) – Tensor with size (ng) representing the weights of the G-points.

estimate_ewald_eta(precision: float) float[source]

estimate the ewald’s sum eta for nuclei interaction energy the precision is assumed to be relative precision this formula is obtained by estimating the sum as an integral.

Parameters:

precision (float) – The precision of the ewald’s sum.

Returns:

eta – The estimated eta.

Return type:

float

class SpinParam(u: T, d: T)[source]

Data structure to store different values for spin-up and spin-down electrons.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import SpinParam
>>> dens_u = torch.ones(1)
>>> dens_d = torch.zeros(1)
>>> sp = SpinParam(u=dens_u, d=dens_d)
>>> sp.u
tensor([1.])
>>> sp.sum()
tensor([1.])
>>> sp.reduce(torch.multiply)
tensor([0.])
__init__(u: T, d: T)[source]

Initialize the SpinParam object.

Parameters:
  • u (any type) – The parameters that corresponds to the spin-up electrons.

  • d (any type) – The parameters that corresponds to the spin-down electrons.

sum()[source]

Returns the sum of up and down parameters.

reduce(fcn: Callable) T[source]

Reduce up and down parameters with the given function.

class ValGrad(value: Tensor, grad: Tensor | None = None, lapl: Tensor | None = None, kin: Tensor | None = None)[source]

Data structure that contains local information about density profiles. Data structure used as a umbrella class for density profiles and the derivative of the potential w.r.t. density profiles.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad
>>> dens = torch.ones(1)
>>> grad = torch.zeros(1)
>>> lapl = torch.ones(1)
>>> kin = torch.ones(1)
>>> vg = ValGrad(value=dens, grad=grad, lapl=lapl, kin=kin)
>>> vg + vg
ValGrad(value=tensor([2.]), grad=tensor([0.]), lapl=tensor([2.]), kin=tensor([2.]))
>>> vg * 5
ValGrad(value=tensor([5.]), grad=tensor([0.]), lapl=tensor([5.]), kin=tensor([5.]))
__init__(value: Tensor, grad: Tensor | None = None, lapl: Tensor | None = None, kin: Tensor | None = None)[source]

Initialize the ValGrad object.

Parameters:
  • value (torch.Tensor) – Tensors containing the value of the local information.

  • grad (torch.Tensor or None) – If tensor, it represents the gradient of the local information with shape (..., 3) where ... should be the same shape as value.

  • lapl (torch.Tensor or None) – If tensor, represents the laplacian value of the local information. It should have the same shape as value.

  • kin (torch.Tensor or None) – If tensor, represents the local kinetic energy density. It should have the same shape as value.

__add__(b)[source]

Add two ValGrad objects together.

__mul__(f: float | int | Tensor)[source]

Multiply the ValGrad object with a scalar.

class CGTOBasis(angmom: int, alphas: Tensor, coeffs: Tensor)[source]

Data structure that contains information about a contracted gaussian type orbital (CGTO).

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import CGTOBasis
>>> alphas = torch.ones(1)
>>> coeffs = torch.ones(1)
>>> cgto = CGTOBasis(angmom=0, alphas=alphas, coeffs=coeffs)
>>> cgto.wfnormalize_()
CGTOBasis(angmom=0, alphas=tensor([1.]), coeffs=tensor([2.5265]), normalized=True)
__init__(angmom: int, alphas: Tensor, coeffs: Tensor)[source]

Initialize the CGTOBasis object.

Parameters:
  • angmom (int) – The angular momentum of the basis.

  • alphas (torch.Tensor) – The gaussian exponents of the basis. Shape: (nbasis,)

  • coeffs (torch.Tensor) – The coefficients of the basis. Shape: (nbasis,)

wfnormalize_() CGTOBasis[source]

Wavefunction normalization

The normalization is obtained from CINTgto_norm from libcint/src/misc.c, or https://github.com/sunqm/libcint/blob/b8594f1d27c3dad9034984a2a5befb9d607d4932/src/misc.c#L80

Please note that the square of normalized wavefunctions do not integrate to 1, but e.g. for s: 4*pi, p: (4*pi/3)

class AtomCGTOBasis(atomz: int | float | Tensor, bases: List[CGTOBasis], pos: List[List[float]] | ndarray | Tensor)[source]

Data structure that contains information about a atom and its contracted gaussian type orbital (CGTO).

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import AtomCGTOBasis, CGTOBasis
>>> alphas = torch.ones(1)
>>> coeffs = torch.ones(1)
>>> cgto = CGTOBasis(angmom=0, alphas=alphas, coeffs=coeffs)
>>> atomcgto = AtomCGTOBasis(atomz=1, bases=[cgto], pos=[[0.0, 0.0, 0.0]])
>>> atomcgto
AtomCGTOBasis(atomz=1, bases=[CGTOBasis(angmom=0, alphas=tensor([1.]), coeffs=tensor([1.]), normalized=False)], pos=tensor([[0., 0., 0.]]))
__init__(atomz: int | float | Tensor, bases: List[CGTOBasis], pos: List[List[float]] | ndarray | Tensor)[source]

Initialize the AtomCGTOBasis object.

Parameters:
  • atomz (ZType) – Atomic number of the atom.

  • bases (List[CGTOBasis]) – List of CGTOBasis objects.

  • pos (AtomPosType) – Position of the atom. Shape: (ndim,)

class BaseXC[source]

This is the base class for the exchange-correlation (XC) functional. The XC functional is used to calculate the exchange-correlation energy and potential. The XC functional is usually divided into three families: LDA, GGA, and Meta-GGA. The LDA is the simplest one, which only depends on the density. The GGA depends on the density and its gradient. The Meta-GGA depends on the density, its gradient, and its Laplacian.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
abstract property family: int[source]

Returns 1 for LDA, 2 for GGA, and 4 for Meta-GGA.

abstract get_edensityxc(densinfo: ValGrad | SpinParam[ValGrad]) Tensor[source]

Returns the xc energy density (energy per unit volume)

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The energy density of the XC.

Return type:

torch.Tensor

get_vxc(densinfo: ValGrad | SpinParam[ValGrad])[source]

Returns the ValGrad for the xc potential given the density info for unpolarized case.

This is the default implementation of vxc if there is no implementation in the specific class of XC.

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The ValGrad for the xc potential. If the XC is unpolarized, then the return is ValGrad. If the XC is polarized, then the return is SpinParam[ValGrad].

Return type:

Union[ValGrad, SpinParam[ValGrad]]

getparamnames(methodname: str, prefix: str = '') List[str][source]

This method should list tensor names that affect the output of the method with name indicated in methodname. If the methodname is not on the list in this function, it should raise KeyError.

Parameters:
  • methodname (str) – The name of the method of the class.

  • prefix (str) – The prefix to be appended in front of the parameters name. This usually contains the dots.

Returns:

Sequence of name of parameters affecting the output of the method.

Return type:

List[str]

Raises:

KeyError – If the list in this function does not contain methodname.

__add__(other: Any) Any[source]

Add two BaseXC together

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC, AddBaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc2 = AddBaseXC(xc, xc)
>>> xc2.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<AddBackward0>)
>>> xc2.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None))
>>> xc3 = xc + xc
>>> xc3.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<AddBackward0>)
>>> xc3.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None))
Parameters:

other (BaseXC) – The BaseXC to be added with.

Returns:

The BaseXC that is the sum of the two BaseXC.

Return type:

BaseXC

__mul__(other: float | int | Tensor)[source]

Multiply a BaseXC with a float or a tensor.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC, MulBaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc2 = MulBaseXC(xc, 2.)
>>> xc2.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc2.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc3 = xc * 2.
>>> xc3.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc3.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
Parameters:

other (Union[float, int, torch.Tensor]) – The float or tensor to be multiplied with.

Returns:

The BaseXC that is the product of the BaseXC and the float or tensor.

Return type:

BaseXC

__rmul__(other: float | int | Tensor)[source]

Multiply a BaseXC with a float or a tensor.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC, MulBaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc2 = MulBaseXC(xc, 2.)
>>> xc2.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc2.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc3 = 2. * xc
>>> xc3.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc3.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
Parameters:

other (Union[float, int, torch.Tensor]) – The float or tensor to be multiplied with.

Returns:

The BaseXC that is the product of the BaseXC and the float or tensor.

Return type:

BaseXC

class AddBaseXC(a: BaseXC, b: BaseXC)[source]

Add two BaseXC together

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC, AddBaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc2 = AddBaseXC(xc, xc)
>>> xc2.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<AddBackward0>)
>>> xc2.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None))
>>> xc3 = xc + xc
>>> xc3.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<AddBackward0>)
>>> xc3.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<AddBackward0>), grad=None, lapl=None, kin=None))
__init__(a: BaseXC, b: BaseXC) None[source]

Initialize the AddBaseXC

Parameters:
  • a (BaseXC) – BaseXC to be added to.

  • b (BaseXC) – BaseXC to be added with.

property family[source]

Returns 1 for LDA, 2 for GGA, and 4 for Meta-GGA.

get_vxc(densinfo: ValGrad | SpinParam[ValGrad]) ValGrad | SpinParam[ValGrad][source]

Returns the ValGrad for the xc potential given the density info for unpolarized case.

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The ValGrad for the xc potential. If the XC is unpolarized, then the return is ValGrad. If the XC is polarized, then the return is SpinParam[ValGrad].

Return type:

Union[ValGrad, SpinParam[ValGrad]]

get_edensityxc(densinfo: ValGrad | SpinParam[ValGrad]) Tensor[source]

Returns the xc energy density (energy per unit volume)

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The energy density of the XC.

Return type:

torch.Tensor

getparamnames(methodname: str, prefix: str = '') List[str][source]

This method should list tensor names that affect the output of the method with name indicated in methodname. If the methodname is not on the list in this function, it should raise KeyError.

Parameters:
  • methodname (str) – The name of the method of the class.

  • prefix (str) – The prefix to be appended in front of the parameters name. This usually contains the dots.

Returns:

Sequence of name of parameters affecting the output of the method.

Return type:

List[str]

Raises:

KeyError – If the list in this function does not contain methodname.

class MulBaseXC(a: BaseXC, b: float | Tensor)[source]

Multiply a BaseXC with a float or a tensor

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import ValGrad, SpinParam
>>> from deepchem.utils.dft_utils import BaseXC, MulBaseXC
>>> class MyXC(BaseXC):
...     @property
...     def family(self) -> int:
...         return 1
...     def get_edensityxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> torch.Tensor:
...         if isinstance(densinfo, ValGrad):
...             return densinfo.value.pow(2)
...         else:
...             return densinfo.u.value.pow(2) + densinfo.d.value.pow(2)
...     def get_vxc(self, densinfo: Union[ValGrad, SpinParam[ValGrad]]) -> Union[ValGrad, SpinParam[ValGrad]]:
...         if isinstance(densinfo, ValGrad):
...             return ValGrad(value=2*densinfo.value)
...         else:
...             return SpinParam(u=ValGrad(value=2*densinfo.u.value),
...                              d=ValGrad(value=2*densinfo.d.value))
>>> xc = MyXC()
>>> densinfo = ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True))
>>> xc.get_edensityxc(densinfo)
tensor([1., 4., 9.], grad_fn=<PowBackward0>)
>>> xc.get_vxc(densinfo)
ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None)
>>> densinfo = SpinParam(u=ValGrad(value=torch.tensor([1., 2., 3.], requires_grad=True)),
...                      d=ValGrad(value=torch.tensor([4., 5., 6.], requires_grad=True)))
>>> xc.get_edensityxc(densinfo)
tensor([17., 29., 45.], grad_fn=<AddBackward0>)
>>> xc.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([2., 4., 6.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([ 8., 10., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc2 = MulBaseXC(xc, 2.)
>>> xc2.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc2.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
>>> xc3 = xc * 2.
>>> xc3.get_edensityxc(densinfo)
tensor([34., 58., 90.], grad_fn=<MulBackward0>)
>>> xc3.get_vxc(densinfo)
SpinParam(u=ValGrad(value=tensor([ 4.,  8., 12.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None), d=ValGrad(value=tensor([16., 20., 24.], grad_fn=<MulBackward0>), grad=None, lapl=None, kin=None))
__init__(a: BaseXC, b: float | Tensor) None[source]

Initialize the MulBaseXC

Parameters:
  • a (BaseXC) – BaseXC to be multiplied to.

  • b (Union[float, torch.Tensor]) – float or tensor to be multiplied with.

property family[source]

Returns 1 for LDA, 2 for GGA, and 4 for Meta-GGA.

get_vxc(densinfo: ValGrad | SpinParam[ValGrad]) ValGrad | SpinParam[ValGrad][source]

Returns the ValGrad for the xc potential given the density info for unpolarized case.

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The ValGrad for the xc potential. If the XC is unpolarized, then the return is ValGrad. If the XC is polarized, then the return is SpinParam[ValGrad].

Return type:

Union[ValGrad, SpinParam[ValGrad]]

get_edensityxc(densinfo: ValGrad | SpinParam[ValGrad]) Tensor[source]

Returns the xc energy density (energy per unit volume)

Parameters:

densinfo (Union[ValGrad, SpinParam[ValGrad]]) – The density information. If the XC is unpolarized, then densinfo is ValGrad. If the XC is polarized, then densinfo is SpinParam[ValGrad]. The ValGrad contains the value and gradient of the density. The SpinParam[ValGrad] contains the value and gradient of the density for each spin channel.

Returns:

The energy density of the XC.

Return type:

torch.Tensor

getparamnames(methodname: str, prefix: str = '') List[str][source]

This method should list tensor names that affect the output of the method with name indicated in methodname. If the methodname is not on the list in this function, it should raise KeyError.

Parameters:
  • methodname (str) – The name of the method of the class.

  • prefix (str) – The prefix to be appended in front of the parameters name. This usually contains the dots.

Returns:

Sequence of name of parameters affecting the output of the method.

Return type:

List[str]

Raises:

KeyError – If the list in this function does not contain methodname.

class BaseGrid[source]

BaseGrid is a class that regulates the integration points over the spatial dimensions.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import BaseGrid
>>> class Grid(BaseGrid):
...     def __init__(self):
...         super(Grid, self).__init__()
...         self.ngrid = 10
...         self.ndim = 3
...         self.dvolume = torch.ones(self.ngrid, dtype=self.dtype, device=self.device)
...         self.rgrid = torch.ones((self.ngrid, self.ndim), dtype=self.dtype, device=self.device)
...     def get_dvolume(self):
...         return self.dvolume
...     def get_rgrid(self):
...         return self.rgrid
>>> grid = Grid()
>>> grid.get_dvolume()
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> grid.get_rgrid()
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

References

Kasim, Muhammad F., and Sam M. Vinko. “Learning the exchange-correlation functional from nature with fully differentiable density functional theory.” Physical Review Letters 127.12 (2021): 126403. https://github.com/diffqc/dqc/blob/0fe821fc92cb3457fb14f6dff0c223641c514ddb/dqc/grid/base_grid.py

abstract property dtype: dtype[source]

dtype of the grid points.

Returns:

dtype of the grid points

Return type:

torch.dtype

abstract property device: device[source]

device of the grid points

Returns:

device of the grid points

Return type:

torch.device

abstract property coord_type: str[source]

type of the coordinate returned in get_rgrid. It can be ‘cartesian’ or ‘spherical’.

Returns:

  • str – type of the coordinate returned in get_rgrid. It can be ‘cartesian’

  • or ‘spherical’.

abstract get_dvolume() Tensor[source]

Obtain the torch.tensor containing the dV elements for the integration.

Returns:

The dV elements for the integration. *BG is the length of the BaseGrid.

Return type:

torch.tensor (*BG, ngrid)

abstract get_rgrid() Tensor[source]

Returns the grid points position in the specified coordinate in self.coord_type.

Returns:

The grid points position. *BG is the length of the BaseGrid.

Return type:

torch.tensor (*BG, ngrid, ndim)

abstract getparamnames(methodname: str, prefix: str = '') List[str][source]

Return a list with the parameter names corresponding to the given method (methodname)

Returns:

List of parameter names of methodname

Return type:

List[str]

class BaseDF[source]

BaseDF represents the density fitting object used in calculating the electron repulsion (and xc energy) in Hamiltonian.

Density fitting in density functional theory (DFT) is a technique used to reduce the computational cost of evaluating electron repulsion integrals. In DFT, the key quantity is the electron density rather than the wave function, and the electron repulsion integrals involve four-electron interactions, making them computationally demanding.

Density fitting exploits the fact that many-electron integrals can be expressed as a sum of two-electron integrals involving auxiliary basis functions. By approximating these auxiliary basis functions, often referred to as fitting functions, the computational cost can be significantly reduced.

Examples

>>> from deepchem.utils.dft_utils import BaseDF
>>> import torch
>>> class MyDF(BaseDF):
...     def __init__(self):
...         super(MyDF, self).__init__()
...     def get_j2c(self):
...         return torch.ones((3, 3))
...     def get_j3c(self):
...         return torch.ones((3, 3, 3))
>>> df = MyDF()
>>> df.get_j2c()
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
abstract build() BaseDF[source]

Construct the matrices required to perform the calculation and return self.

Returns:

The constructed density fitting object.

Return type:

BaseDF

abstract get_elrep(dm: Tensor) LinearOperator[source]

Construct the electron repulsion linear operator from the given density matrix using the density fitting method.

Parameters:

dm (torch.Tensor) – The density matrix.

Returns:

The electron repulsion linear operator.

Return type:

LinearOperator

abstract property j2c: Tensor[source]

Returns the 2-centre 2-electron integrals of the auxiliary basis.

Returns:

The 2-centre 2-electron integrals of the auxiliary basis.

Return type:

torch.Tensor

abstract property j3c: Tensor[source]

Return the 3-centre 2-electron integrals of the auxiliary basis and the basis.

Returns:

The 3-centre 2-electron integrals of the auxiliary basis and the basis.

Return type:

torch.Tensor

abstract getparamnames(methodname: str, prefix: str = '') List[str][source]

This method should list tensor names that affect the output of the method with name indicated in methodname.

Parameters:
  • methodname (str) – The name of the method of the class.

  • prefix (str (default="")) – The prefix to be appended in front of the parameters name. This usually contains the dots.

Returns:

Sequence of name of parameters affecting the output of the method.

Return type:

List[str]

class BaseHamilton[source]

Hamilton is a class that provides the LinearOperator of the Hamiltonian components.

The Hamiltonian represents the total energy operator for a system of interacting electrons. The Kohn-Sham DFT approach introduces a set of fictitious non-interacting electrons that move in an effective potential. The total energy functional, which includes the kinetic energy of these fictitious electrons and their interaction with an effective potential (including the electron-electron interaction), is minimized to obtain the ground-state electronic structure.

The Kohn-Sham Hamiltonian is a key component of this approach, representing the operator that governs the evolution of the Kohn-Sham orbitals. It includes terms for the kinetic energy of electrons, the external potential (usually from nuclei), and the exchange-correlation potential that accounts for the electron-electron interactions.

The Fock matrix represents the one-electron part of the Hamiltonian matrix. Its components include kinetic energy, nuclear attraction, and electron-electron repulsion integrals. The Fock matrix is pivotal in solving the electronic Schrödinger equation and determining the electronic structure of molecular systems.

Examples

>>> from deepchem.utils.dft_utils import BaseHamilton
>>> class MyHamilton(BaseHamilton):
...    def __init__(self):
...        self._nao = 2
...        self._kpts = torch.tensor([[0.0, 0.0, 0.0]])
...        self._df = None
...    @property
...    def nao(self):
...        return self._nao
...    @property
...    def kpts(self):
...        return self._kpts
...    @property
...    def df(self):
...        return self._df
...    def build(self):
...        return self
...    def get_nuclattr(self):
...        return torch.ones((1, 1, self.nao, self.nao))
>>> ham = MyHamilton()
>>> hamilton = ham.build()
>>> hamilton.get_nuclattr()
tensor([[[[1., 1.],
          [1., 1.]]]])
abstract property nao: int[source]

Number of atomic orbital basis.

Returns:

Number of atomic orbital basis.

Return type:

int

abstract property kpts: Tensor[source]

List of k-points in the Hamiltonian.

Returns:

List of k-points in the Hamiltonian. Shape: (nkpts, ndim)

Return type:

torch.Tensor

abstract property df: BaseDF | None[source]

Returns the density fitting object (if any) attached to this Hamiltonian object.

Returns:

Returns the density fitting object (if any) attached to this Hamiltonian object.

Return type:

Optional[BaseDF]

abstract build()[source]

Construct the elements needed for the Hamiltonian. Heavy-lifting operations should be put here.

abstract setup_grid(grid: BaseGrid, xc: BaseXC | None = None) None[source]

Setup the basis (with its grad) in the spatial grid and prepare the gradient of atomic orbital according to the ones required by the xc. If xc is not given, then only setup the grid with ao (without any gradients of ao)

Parameters:
  • grid (BaseGrid) – Grid used to setup this Hamilton.

  • xc (Optional[BaseXC] (default None)) – Exchange Corelation functional of this Hamiltonian.

abstract get_nuclattr() LinearOperator[source]

LinearOperator of the nuclear Coulomb attraction.

Nuclear Coulomb attraction is the electrostatic force binding electrons to a nucleus. Positively charged protons attract negatively charged electrons, creating stability in quantum systems. This force plays a fundamental role in determining the structure and behavior of atoms, contributing significantly to the overall potential energy in atomic physics.

Returns:

LinearOperator of the nuclear Coulomb attraction. Shape: (*BH, nao, nao)

Return type:

LinearOperator

abstract get_kinnucl() LinearOperator[source]

Returns the LinearOperator of the one-electron operator (i.e. kinetic and nuclear attraction). Action of a LinearOperator on a function is a linear transformation. In the case of one-electron operators, these transformations are essential for solving the Schrödinger equation and understanding the behavior of electrons in an atomic or molecular system.

Returns:

LinearOperator of the one-electron operator. Shape: (*BH, nao, nao)

Return type:

LinearOperator

abstract get_overlap() LinearOperator[source]

Returns the LinearOperator representing the overlap of the basis. The overlap of the basis refers to the degree to which atomic or molecular orbitals in a quantum mechanical system share common space.

Returns:

LinearOperator representing the overlap of the basis. Shape: (*BH, nao, nao)

Return type:

LinearOperator

abstract get_elrep(dm: Tensor) LinearOperator[source]

Obtains the LinearOperator of the Coulomb electron repulsion operator. Known as the J-matrix.

In the context of electronic structure theory, it accounts for the repulsive interaction between electrons in a many-electron system. The J-matrix elements involve the Coulombic interactions between pairs of electrons, influencing the total energy and behavior of the system.

Parameters:

dm (torch.Tensor) – Density matrix. Shape: (*BD, nao, nao)

Returns:

LinearOperator of the Coulomb electron repulsion operator. Shape: (*BDH, nao, nao)

Return type:

LinearOperator

abstract get_exchange(dm: Tensor | SpinParam[Tensor]) LinearOperator | SpinParam[LinearOperator][source]

Obtains the LinearOperator of the exchange operator. It is -0.5 * K where K is the K matrix obtained from 2-electron integral.

Exchange operator is a mathematical representation of the exchange interaction between identical particles, such as electrons. The exchange operator quantifies the effect of interchanging the positions of two particles.

Parameters:

dm (Union[torch.Tensor, SpinParam[torch.Tensor]]) – Density matrix. Shape: (*BD, nao, nao)

Returns:

LinearOperator of the exchange operator. Shape: (*BDH, nao, nao)

Return type:

Union[LinearOperator, SpinParam[LinearOperator]]

abstract get_vext(vext: Tensor) LinearOperator[source]

Returns a LinearOperator of the external potential in the grid.

\[\mathbf{V}_{ij} = \int b_i(\mathbf{r}) V(\mathbf{r}) b_j(\mathbf{r})\ d\mathbf{r}\]

External potential energy that a particle experiences in a discretized space or grid. In quantum mechanics or computational physics, when solving for the behavior of particles, an external potential is often introduced to represent the influence of external forces.

Parameters:

vext (torch.Tensor) – External potential in the grid. Shape: (*BR, ngrid)

Returns:

LinearOperator of the external potential in the grid. Shape: (*BRH, nao, nao)

Return type:

LinearOperator

abstract get_vxc(dm: Tensor | SpinParam[Tensor]) LinearOperator | SpinParam[LinearOperator][source]

Returns a LinearOperator for the exchange-correlation potential

The exchange-correlation potential combines two effects:

1. Exchange potential: Arises from the antisymmetry of the electron wave function. It quantifies the tendency of electrons to avoid each other due to their indistinguishability.

2. Correlation potential: Accounts for the electron-electron correlation effects that arise from the repulsion between electrons.

TODO: check if what we need for Meta-GGA involving kinetics and for exact-exchange

Parameters:

dm (Union[torch.Tensor, SpinParam[torch.Tensor]]) – Density matrix. Shape: (*BD, nao, nao)

Returns:

LinearOperator for the exchange-correlation potential. Shape: (*BDH, nao, nao)

Return type:

Union[LinearOperator, SpinParam[LinearOperator]]

abstract ao_orb2dm(orb: Tensor, orb_weight: Tensor) Tensor[source]

Convert the atomic orbital to the density matrix.

Parameters:
  • orb (torch.Tensor) – Atomic orbital. Shape: (*BO, nao, norb)

  • orb_weight (torch.Tensor) – Orbital weight. Shape: (*BW, norb)

Returns:

Density matrix. Shape: (*BOWH, nao, nao)

Return type:

torch.Tensor

abstract aodm2dens(dm: Tensor, xyz: Tensor) Tensor[source]

Get the density value in the Cartesian coordinate.

Parameters:
  • dm (torch.Tensor) – Density matrix. Shape: (*BD, nao, nao)

  • xyz (torch.Tensor) – Cartesian coordinate. Shape: (*BR, ndim)

Returns:

Density value in the Cartesian coordinate. Shape: (*BRD)

Return type:

torch.Tensor

abstract get_e_hcore(dm: Tensor) Tensor[source]

Get the energy from the one-electron Hamiltonian. The input is total density matrix.

Parameters:

dm (torch.Tensor) – Total Density matrix.

Returns:

Energy from the one-electron Hamiltonian.

Return type:

torch.Tensor

abstract get_e_elrep(dm: Tensor) Tensor[source]

Get the energy from the electron repulsion. The input is total density matrix.

Parameters:

dm (torch.Tensor) – Total Density matrix.

Returns:

Energy from the one-electron Hamiltonian.

Return type:

torch.Tensor

abstract get_e_exchange(dm: Tensor | SpinParam[Tensor]) Tensor[source]

Get the energy from the exact exchange.

Parameters:

dm (Union[torch.Tensor, SpinParam[torch.Tensor]]) – Density matrix.

Returns:

Energy from the exact exchange.

Return type:

torch.Tensor

abstract get_e_xc(dm: Tensor | SpinParam[Tensor]) Tensor[source]

Returns the exchange-correlation energy using the xc object given in .setup_grid()

Parameters:

dm (Union[torch.Tensor, SpinParam[torch.Tensor]]) – Density matrix. Shape: (*BD, nao, nao)

Returns:

Exchange-correlation energy.

Return type:

torch.Tensor

abstract ao_orb_params2dm(ao_orb_params: Tensor, ao_orb_coeffs: Tensor, orb_weight: Tensor, with_penalty: float | None = None) List[Tensor][source]

Convert the atomic orbital free parameters (parametrized in such a way so it is not bounded) to the density matrix.

Parameters:
  • ao_orb_params (torch.Tensor) – The tensor that parametrized atomic orbital in an unbounded space.

  • ao_orb_coeffs (torch.Tensor) – The tensor that helps ao_orb_params in describing the orbital. The difference with ao_orb_params is that ao_orb_coeffs is not differentiable and not to be optimized in variational method.

  • orb_weight (torch.Tensor) – The orbital weights.

  • with_penalty (float or None) – If a float, it returns a tuple of tensors where the first element is dm, and the second element is the penalty multiplied by the penalty weights. The penalty is to compensate the overparameterization of ao_orb_params, stabilizing the Hessian for gradient calculation.

Returns:

The density matrix from the orbital parameters and (if with_penalty) the penalty of the overparameterization of ao_orb_params.

Return type:

torch.Tensor or tuple of torch.Tensor

Notes

  • The penalty should be 0 if ao_orb_params is from dm2ao_orb_params.

  • The density matrix should be recoverable when put through dm2ao_orb_params and ao_orb_params2dm.

abstract dm2ao_orb_params(dm: Tensor, norb: int) Tuple[Tensor, Tensor][source]

Convert from the density matrix to the orbital parameters. The map is not one-to-one, but instead one-to-many where there might be more than one orbital parameters to describe the same density matrix. For restricted systems, only one of the dm (dm.u or dm.d) is sufficient.

Parameters:
  • dm (torch.Tensor) – The density matrix.

  • norb (int) – The number of orbitals for the system.

Returns:

The atomic orbital parameters for the first returned value and the atomic orbital coefficients for the second value.

Return type:

tuple of 2 torch.Tensor

abstract getparamnames(methodname: str, prefix: str = '') List[str][source]

Return the paramnames

Parameters:
  • methodname (str) – The name of the method.

  • prefix (str (default "")) – The prefix of the paramnames.

Returns:

The paramnames.

Return type:

List[str]

class _Config(THRESHOLD_MEMORY: int = 10737418240, CHUNK_MEMORY: int = 16777216, VERBOSE: int = 0)[source]

Contains the configuration for the DFT module

Examples

>>> from deepchem.utils.dft_utils.config import config
>>> Memory_usage = 1024**4 # Sample Memory usage by some Object/Matrix
>>> if Memory_usage > config.THRESHOLD_MEMORY :
...     print("Overload")
Overload
THRESHOLD_MEMORY[source]

Threshold memory (matrix above this size should not be constructed)

Type:

int (default=10*1024**3)

CHUNK_MEMORY[source]

The memory for splitting big tensors into chunks.

Type:

int (default=16*1024**2)

VERBOSE[source]

Allowed Verbosity level (Defines the level of detail) Used by Looger for maintaining Logs.

Type:

int (default=0)

Usage[source]
-----
1. HamiltonCGTO
Type:

Usage it for splitting big tensors into chunks.

__init__(THRESHOLD_MEMORY: int = 10737418240, CHUNK_MEMORY: int = 16777216, VERBOSE: int = 0) None[source]
class BaseOrbParams[source]

Class that provides free-parameterization of orthogonal orbitals.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import BaseOrbParams
>>> class MyOrbParams(BaseOrbParams):
...     @staticmethod
...     def params2orb(params, coeffs, with_penalty):
...         return params, coeffs
...     @staticmethod
...     def orb2params(orb):
...         return orb, torch.tensor([0], dtype=orb.dtype, device=orb.device)
>>> params = torch.randn(3, 4, 5)
>>> coeffs = torch.randn(3, 4, 5)
>>> with_penalty = 0.1
>>> orb, penalty = MyOrbParams.params2orb(params, coeffs, with_penalty)
>>> params2, coeffs2 = MyOrbParams.orb2params(orb)
>>> torch.allclose(params, params2)
True
static params2orb(params: Tensor, coeffs: Tensor, with_penalty: float = 0.0) List[Tensor][source]

Convert the parameters & coefficients to the orthogonal orbitals. params is the tensor to be optimized in variational method, while coeffs is a tensor that is needed to get the orbital, but it is not optimized in the variational method.

Parameters:
  • params (torch.Tensor) – The free parameters to be optimized.

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals.

  • with_penalty (float (default 0.0)) – If not 0.0, return the penalty term for the free parameters.

Returns:

  • orb (torch.Tensor) – The orthogonal orbitals.

  • penalty (torch.Tensor) – The penalty term for the free parameters. If with_penalty is 0.0, this is not returned.

static orb2params(orb: Tensor) List[Tensor][source]

Get the free parameters from the orthogonal orbitals. Returns params and coeffs described in params2orb.

Parameters:

orb (torch.Tensor) – The orthogonal orbitals.

Returns:

  • params (torch.Tensor) – The free parameters to be optimized.

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals.

class QROrbParams[source]

Orthogonal orbital parameterization using QR decomposition. The orthogonal orbital is represented by:

P = QR

Where Q is the parameters defining the rotation of the orthogonal tensor, and R is the coefficients tensor.

Examples

>>> import torch
>>> from deepchem.utils.dft_utils import QROrbParams
>>> params = torch.randn(3, 3)
>>> coeffs = torch.randn(4, 3)
>>> with_penalty = 0.1
>>> orb, penalty = QROrbParams.params2orb(params, coeffs, with_penalty)
>>> params2, coeffs2 = QROrbParams.orb2params(orb)
static params2orb(params: Tensor, coeffs: Tensor, with_penalty: float = 0.0) List[Tensor][source]

Convert the parameters & coefficients to the orthogonal orbitals. params is the tensor to be optimized in variational method, while coeffs is a tensor that is needed to get the orbital, but it is not optimized in the variational method.

Parameters:
  • params (torch.Tensor) – The free parameters to be optimized.

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals.

  • with_penalty (float (default 0.0)) – If not 0.0, return the penalty term for the free parameters.

Returns:

  • orb (torch.Tensor) – The orthogonal orbitals.

  • penalty (torch.Tensor) – The penalty term for the free parameters. If with_penalty is 0.0, this is not returned.

static orb2params(orb: Tensor) List[Tensor][source]

Get the free parameters from the orthogonal orbitals. Returns params and coeffs described in params2orb.

Parameters:

orb (torch.Tensor) – The orthogonal orbitals.

Returns:

  • params (torch.Tensor) – The free parameters to be optimized.

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals.

class MatExpOrbParams[source]

Orthogonal orbital parameterization using matrix exponential. The orthogonal orbital is represented by:

P = matrix_exp(Q) @ C

where C is an orthogonal coefficient tensor, and Q is the parameters defining the rotation of the orthogonal tensor.

Examples

>>> from deepchem.utils.dft_utils import MatExpOrbParams
>>> params = torch.randn(3, 3)
>>> coeffs = torch.randn(4, 3)
>>> with_penalty = 0.1
>>> orb, penalty = MatExpOrbParams.params2orb(params, coeffs, with_penalty)
>>> params2, coeffs2 = MatExpOrbParams.orb2params(orb)
static params2orb(params: Tensor, coeffs: Tensor, with_penalty: float = 0.0) List[Tensor][source]

Convert the parameters & coefficients to the orthogonal orbitals. params is the tensor to be optimized in variational method, while coeffs is a tensor that is needed to get the orbital, but it is not optimized in the variational method.

Parameters:
  • params (torch.Tensor) – The free parameters to be optimized. (*, nparams)

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals. (*, nao, norb)

  • with_penalty (float (default 0.0)) – If not 0.0, return the penalty term for the free parameters.

Returns:

  • orb (torch.Tensor) – The orthogonal orbitals.

  • penalty (torch.Tensor) – The penalty term for the free parameters. If with_penalty is 0.0, this is not returned.

static orb2params(orb: Tensor) List[Tensor][source]

Get the free parameters from the orthogonal orbitals. Returns params and coeffs described in params2orb.

Parameters:

orb (torch.Tensor) – The orthogonal orbitals.

Returns:

  • params (torch.Tensor) – The free parameters to be optimized.

  • coeffs (torch.Tensor) – The coefficients to get the orthogonal orbitals.

class parse_moldesc(moldesc: str | Tuple[List[str] | List[int | float | Tensor] | Tensor, List[List[float]] | ndarray | Tensor], dtype: dtype = torch.float64, device: device = device(type='cpu'))[source]

Parse the string of molecular descriptor and returns tensors of atomzs and atom positions. .. rubric:: Examples

>>> from deepchem.utils.dft_utils import parse_moldesc
>>> system = {
...     'type': 'mol',
...     'kwargs': {
...         'moldesc': 'H 0.86625 0 0; F -0.86625 0 0',
...         'basis': '6-311++G(3df,3pd)'
...     }
... }
>>> atomzs, atomposs = parse_moldesc(system["kwargs"]["moldesc"])
>>> atomzs
tensor([1., 9.], dtype=torch.float64)
>>> atomposs
tensor([[ 0.8662,  0.0000,  0.0000],
        [-0.8662,  0.0000,  0.0000]], dtype=torch.float64)
Parameters:
  • moldesc (Union[str, Tuple[AtomZsType, AtomPosType]]) – String that describes the system, e.g. "H -1 0 0; H 1 0 0" for H2 molecule separated by 2 Bohr.

  • dtype (torch.dtype (default torch.float64)) – The datatype of the returned atomic positions.

  • device (torch.device (default torch.device('cpu'))) – The device to store the returned tensors.

Returns:

  • atomzs (torch.Tensor) – The tensor of atomzs [Atom Number].

  • atompos (torch.Tensor) – The tensor of atomic positions [Bohr].

class BaseSystem[source]

System is a class describing the environment before doing the quantum chemistry calculation. It contains the information of the atoms, the external electric field, the spin, the charge, etc. It also contains the Hamiltonian object and the grid object for the calculation. The system object is also responsible for setting up the cache for the parameters that can be read/written from/to the cache file.

Examples

>>> from deepchem.utils.dft_utils import BaseSystem
>>> from deepchem.utils.dft_utils import BaseHamilton
>>> from deepchem.utils.dft_utils import BaseGrid
>>> class MySystem(BaseSystem):
...     def __init__(self):
...         self.hamiltonian = BaseHamilton()
...         self.grid = BaseGrid()
...     def get_hamiltonian(self):
...         return self.hamiltonian
...     def get_grid(self):
...         return self.grid
...     def requires_grid(self):
...         return True
>>> system = MySystem()
>>> system.requires_grid()
True
abstract densityfit(method: str | None = None, auxbasis: str | List[CGTOBasis] | List[str] | List[List[CGTOBasis]] | Dict[str | int, List[CGTOBasis] | str] | None = None) BaseSystem[source]

Indicate that the system’s Hamiltonian will use density fitting.

Parameters:
  • method (Optional[str] (default None)) – The density fitting method to use.

  • auxbasis (Optional[BasisInpType] (default None)) – Auxiliary basis set to use for density fitting.

Returns:

The system with density fitting enabled.

Return type:

BaseSystem

abstract get_hamiltonian() BaseHamilton[source]

Hamiltonian object for the system.

Returns:

Hamiltonian object for the system.

Return type:

BaseHamilton

abstract set_cache(fname: str, paramnames: List[str] | None = None) BaseSystem[source]

Set up the cache to read/write some parameters from the given files. If paramnames is not given, then read/write all cache-able parameters specified by each class.

Parameters:
  • fname (str) – The file name of the cache file.

  • paramnames (Optional[List[str]] (default None)) – The list of parameter names to read/write from the cache file.

Returns:

The system with cache enabled.

Return type:

BaseSystem

abstract get_orbweight(polarized: bool = False) Tensor | SpinParam[Tensor][source]

Returns the atomic orbital weights. If polarized == False, then it returns the total orbital weights. Otherwise, it returns a tuple of orbital weights for spin-up and spin-down.

Parameters:

polarized (bool (default False)) – Whether to return the orbital weights for each spin.

Returns:

The orbital weights. Shape (*BS, norb)

Return type:

Union[torch.Tensor, SpinParam[torch.Tensor]]

abstract get_nuclei_energy() Tensor[source]

Returns the nuclei-nuclei repulsion energy.

Returns:

The nuclei-nuclei repulsion energy.

Return type:

torch.Tensor

abstract setup_grid() None[source]

Construct the integration grid for the system.

abstract get_grid() BaseGrid[source]

Returns the grid of the system

abstract requires_grid() bool[source]

True if the system needs the grid to be constructed. Otherwise, returns False

abstract getparamnames(methodname: str, prefix: str = '') List[str][source]

Return a list with the parameter names corresponding to the given method (methodname)

Parameters:
  • methodname (str) – The name of the method.

  • prefix (str (default "")) – The prefix of the parameter names.

Returns:

List of parameter names of methodname

Return type:

List[str]

abstract make_copy(**kwargs) BaseSystem[source]

Copy of the system identical to the orginal except for new parameters set in the kwargs.

Parameters:

kwargs – New parameters to set in the copy.

Returns:

Copy of the system identical to the orginal except for new parameters set in the kwargs.

Return type:

BaseSystem

abstract property atompos: Tensor[source]

Atom positions with shape (natoms, ndim).

Returns:

Atom positions with shape (natoms, ndim).

Return type:

torch.Tensor

abstract property atomzs: Tensor[source]

Atomic number with shape (natoms,).

Returns:

Atomic number with shape (natoms,).

Return type:

torch.Tensor

abstract property atommasses: Tensor[source]

Atomic mass with shape (natoms) in atomic unit.

Returns:

Atomic mass with shape (natoms) in atomic unit.

Return type:

torch.Tensor

abstract property spin: int | float | Tensor[source]

Total spin of the system.

Returns:

Total spin of the system.

Return type:

ZType

abstract property charge: int | float | Tensor[source]

Charge of the system.

Returns:

Charge of the system.

Return type:

ZType

abstract property numel: int | float | Tensor[source]

Total number of the electrons in the system.

Returns:

Total number of the electrons in the system.

Return type:

ZType

abstract property efield: Tuple[Tensor, ...] | None[source]

External electric field of the system, or None if there is no electric field.

class RadialGrid(ngrid: int, grid_integrator: str = 'chebyshev', grid_transform: str | BaseGridTransform = 'logm3', dtype: dtype = torch.float64, device: device = device(type='cpu'))[source]

Grid for radially symmetric system. This grid consists grid_integrator and grid_transform specifiers.

grid_integrator is to specify how to perform an integration on a fixed interval from -1 to 1.

grid_transform is to transform the integration from the coordinate of grid_integrator to the actual coordinate.

Examples

>>> grid = RadialGrid(100, grid_integrator="chebyshev",
...                   grid_transform="logm3")
>>> grid.get_rgrid().shape
torch.Size([100, 1])
>>> grid.get_dvolume().shape
torch.Size([100])
__init__(ngrid: int, grid_integrator: str = 'chebyshev', grid_transform: str | BaseGridTransform = 'logm3', dtype: dtype = torch.float64, device: device = device(type='cpu'))[source]

Initialize the RadialGrid.

Parameters:
  • ngrid (int) – Number of grid points.

  • grid_integrator (str (default "chebyshev")) – The grid integrator to use. Available options are “chebyshev”, “chebyshev2”, and “uniform”.

  • grid_transform (Union[str, BaseGridTransform] (default "logm3")) – The grid transformation to use. Available options are “logm3”, “de2”, and “treutlerm4”.

  • dtype (torch.dtype, optional (default torch.float64)) – The data type to use for the grid.

  • device (torch.device, optional (default torch.device('cpu'))) – The device to use for the grid.

property coord_type[source]

Returns the coordinate type of the grid.

Returns:

The coordinate type of the grid. For RadialGrid, this is “radial”.

Return type: