Model Classes

DeepChem maintains an extensive collection of models for scientific applications. DeepChem’s focus is on facilitating scientific applications, so we support a broad range of different machine learning frameworks (currently scikit-learn, xgboost, TensorFlow, and PyTorch) since different frameworks are more and less suited for different scientific applications.

Model Cheatsheet

If you’re just getting started with DeepChem, you’re probably interested in the basics. The place to get started is this “model cheatsheet” that lists various types of custom DeepChem models. Note that some wrappers like SklearnModel and GBDTModel which wrap external machine learning libraries are excluded, but this table should otherwise be complete.

As a note about how to read these tables: Each row describes what’s needed to invoke a given model. Some models must be applied with given Transformer or Featurizer objects. Most models can be trained calling model.fit, otherwise the name of the fit_method is given in the Comment column. In order to run the models, make sure that the backend (Keras and tensorflow or Pytorch or Jax) is installed. You can thus read off what’s needed to train the model from the table below.

General purpose

General purpose models

Model

Reference

Classifier/Regressor

Acceptable Featurizers

Backend

Comment

CNN

Classifier/ Regressor

Keras

MultitaskClassifier

Classifier

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

PyTorch

MultitaskFitTransformRegressor

Regressor

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

PyTorch

any Transformer can be used

MultitaskIRVClassifier

Classifier

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Keras/PyTorch

use IRVTransformer

MultitaskRegressor

Regressor

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Torch

ProgressiveMultitaskClassifier

ref

Classifier

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Keras

ProgressiveMultitaskRegressor

ref

Regressor

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Keras

RobustMultitaskClassifier

ref

Classifier

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Keras

RobustMultitaskRegressor

ref

Regressor

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

Keras

SeqToSeq

ref

PyTorch

fit method: fit_sequences

WGAN

ref

Adversarial

Keras

fit method: fit_gan

UNet

ref

Classifier/ Regressor

PyTorch

PINNModel

ref

Classifier/Regressor

PyTorch

FNOModel

ref

Classifier/Regressor

PyTorch

Molecules

Many models implemented in DeepChem were designed for small to medium-sized organic molecules, most often drug-like compounds. If your data is very different (e.g. molecules contain ‘exotic’ elements not present in the original dataset) or cannot be represented well using SMILES (e.g. metal complexes, crystals), some adaptations to the featurization and/or model might be needed to get reasonable results.

Molecular models

Model

Reference

Type

Acceptable Featurizers

Backend

Comment

ScScoreModel

ref

Classifier

CircularFingerprint

Keras

AtomicConvModel

ref

Classifier/ Regressor

ComplexNeighborListFragmentAtomicCoordinates

Keras

AttentiveFPModel

ref

Classifier/ Regressor

MolGraphConvFeaturizer

PyTorch

ChemCeption

ref

Classifier/ Regressor

SmilesToImage

Keras/PyTorch

DAGModel

ref

Classifier/ Regressor

ConvMolFeaturizer

Keras

use DAGTransformer

GATModel

ref

Classifier/ Regressor

MolGraphConvFeaturizer

DGL/PyTorch

GCNModel

ref

Classifier/ Regressor

MolGraphConvFeaturizer

DGL/PyTorch

GraphConvModel

ref

Classifier/ Regressor

ConvMolFeaturizer

Keras

MEGNetModel

ref

Classifier/ Regressor

PyTorch/PyTorch Geometric

MPNNModel

ref

Classifier/ Regressor

MolGraphConvFeaturizer

DGL/PyTorch

PagtnModel

ref

Classifier/ Regressor

PagtnMolGraphFeaturizer MolGraphConvFeaturizer

DGL/PyTorch

Smiles2Vec

ref

Classifier/ Regressor

SmilesToSeq

Keras/PyTorch

TextCNNModel

ref

Classifier/ Regressor

Keras/PyTorch

DTNNModel

ref

Regressor

CoulombMatrix

PyTorch

MATModel

ref

Regressor

MATFeaturizer

PyTorch

WeaveModel

ref

Regressor

WeaveFeaturizer

Keras

BasicMolGANModel

ref

Generator

MolGanFeaturizer

Keras

fit method: fit_gan

DMPNNModel

ref

Classifier/ Regressor

DMPNNFeaturizer

PyTorch

InfoGraph

ref

Classifier/ Regressor

MolGraphConvFeaturizer

PyTorch

InfoGraphStar

ref

Classifier/ Regressor

MolGraphConvFeaturizer

PyTorch

GNNModular

ref

Classifier/ Regressor

SNAPFeaturizer

PyTorch

InfoMax3DModular

ref

Unsupervised

RDKitConformerFeaturizer

PyTorch

Chemberta

ref

Classifier/ Regressor

RobertaTokenizer

PyTorch

MoLFormer

ref

Classifier/ Regressor

DummyFeaturizer

PyTorch

ProgressiveMultitaskModel

ref

Classifier/ Regressor

CircularFingerprint RDKitDescriptors CoulombMatrixEig RdkitGridFeaturizer BindingPocketFeaturizer ElementPropertyFingerprint

PyTorch

Materials

The following models were designed specifically for (inorganic) materials.

Material models

Model

Reference

Type

Acceptable Featurizers

Backend

Comment

CGCNNModel

ref

Classifier/Regressor

CGCNNFEaturizer

DGL/PTorch

crystal graph CNN

MEGNetModel

ref

Classifier/Regressor

PyTorch/PyTorch Geometric

LCNNModel

ref

Regressor

LCNNFeaturizer

PyTorch

lattice CNN

Model

class Model(model=None, model_dir: str | None = None, **kwargs)[source]

Abstract base class for DeepChem models.

__init__(model=None, model_dir: str | None = None, **kwargs) None[source]

Abstract class for all models.

This is intended only for convenience of subclass implementations and should not be invoked directly.

Parameters:
  • model (object) – Wrapper around ScikitLearn/Keras/Tensorflow model object.

  • model_dir (str, optional (default None)) – Path to directory where model will be stored. If not specified, model will be stored in a temporary directory.

fit_on_batch(X: Sequence, y: Sequence, w: Sequence)[source]

Perform a single step of training.

Parameters:
  • X (np.ndarray) – the inputs for the batch

  • y (np.ndarray) – the labels for the batch

  • w (np.ndarray) – the weights for the batch

predict_on_batch(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes])[source]

Makes predictions on given batch of new data.

Parameters:

X (np.ndarray) – Features

reload() None[source]

Reload trained model from disk.

static get_model_filename(model_dir: str) str[source]

Given model directory, obtain filename for the model itself.

static get_params_filename(model_dir: str) str[source]

Given model directory, obtain filename for the model itself.

save() None[source]

Dispatcher function for saving.

Each subclass is responsible for overriding this method.

fit(dataset: Dataset)[source]

Fits a model on data in a Dataset object.

Parameters:

dataset (Dataset) – the Dataset to train on

predict(dataset: Dataset, transformers: List[Transformer] = []) ndarray | Sequence[ndarray][source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on

  • transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

Returns:

A numpy array of predictions the model produces.

Return type:

np.ndarray

evaluate(dataset: Dataset, metrics: List[Metric], transformers: List[Transformer] = [], per_task_metrics: bool = False, use_sample_weights: bool = False, n_classes: int = 2)[source]

Evaluates the performance of this model on specified dataset.

This function uses Evaluator under the hood to perform model evaluation. As a result, it inherits the same limitations of Evaluator. Namely, that only regression and classification models can be evaluated in this fashion. For generator models, you will need to overwrite this method to perform a custom evaluation.

Keyword arguments specified here will be passed to Evaluator.compute_model_performance.

Parameters:
  • dataset (Dataset) – Dataset object.

  • metrics (Metric / List[Metric] / function) – The set of metrics provided. This class attempts to do some intelligent handling of input. If a single dc.metrics.Metric object is provided or a list is provided, it will evaluate self.model on these metrics. If a function is provided, it is assumed to be a metric function that this method will attempt to wrap in a dc.metrics.Metric object. A metric function must accept two arguments, y_true, y_pred both of which are np.ndarray objects and return a floating point score. The metric function may also accept a keyword argument sample_weight to account for per-sample weights.

  • transformers (List[Transformer]) – List of dc.trans.Transformer objects. These transformations must have been applied to dataset previously. The dataset will be untransformed for metric evaluation.

  • per_task_metrics (bool, optional (default False)) – If true, return computed metric for each task on multitask dataset.

  • use_sample_weights (bool, optional (default False)) – If set, use per-sample weights w.

  • n_classes (int, optional (default None)) – If specified, will use n_classes as the number of unique classes in self.dataset. Note that this argument will be ignored for regression metrics.

Returns:

  • multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.

  • all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.

get_task_type() str[source]

Currently models can only be classifiers or regressors.

get_num_tasks() int[source]

Get number of tasks.

Scikit-Learn Models

Scikit-learn’s models can be wrapped so that they can interact conveniently with DeepChem. Oftentimes scikit-learn models are more robust and easier to train and are a nice first model to train.

SklearnModel

class SklearnModel(model: BaseEstimator, model_dir: str | None = None, **kwargs)[source]

Wrapper class that wraps scikit-learn models as DeepChem models.

When you’re working with scikit-learn and DeepChem, at times it can be useful to wrap a scikit-learn model as a DeepChem model. The reason for this might be that you want to do an apples-to-apples comparison of a scikit-learn model to another DeepChem model, or perhaps you want to use the hyperparameter tuning capabilities in dc.hyper. The SklearnModel class provides a wrapper around scikit-learn models that allows scikit-learn models to be trained on Dataset objects and evaluated with the same metrics as other DeepChem models.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> # Generating a random data and creating a dataset
>>> X, y = np.random.randn(5, 1), np.random.randn(5)
>>> dataset = dc.data.NumpyDataset(X, y)
>>> # Wrapping a Sklearn Linear Regression model using DeepChem models API
>>> sklearn_model = LinearRegression()
>>> dc_model = dc.models.SklearnModel(sklearn_model)
>>> dc_model.fit(dataset)  # fitting dataset

Notes

All SklearnModels perform learning solely in memory. This means that it may not be possible to train SklearnModel on large `Dataset`s.

__init__(model: BaseEstimator, model_dir: str | None = None, **kwargs)[source]
Parameters:
  • model (BaseEstimator) – The model instance which inherits a scikit-learn BaseEstimator Class.

  • model_dir (str, optional (default None)) – If specified the model will be stored in this directory. Else, a temporary directory will be used.

  • model_instance (BaseEstimator (DEPRECATED)) – The model instance which inherits a scikit-learn BaseEstimator Class.

  • kwargs (dict) – kwargs[‘use_weights’] is a bool which determines if we pass weights into self.model.fit().

fit(dataset: Dataset) None[source]

Fits scikit-learn model to data.

Parameters:

dataset (Dataset) – The Dataset to train this model on.

predict_on_batch(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) ndarray[source]

Makes predictions on batch of data.

Parameters:

X (np.ndarray) – A numpy array of features.

Returns:

The value is a return value of predict_proba or predict method of the scikit-learn model. If the scikit-learn model has both methods, the value is always a return value of predict_proba.

Return type:

np.ndarray

predict(X: Dataset, transformers: List[Transformer] = []) ndarray | Sequence[ndarray][source]

Makes predictions on dataset.

Parameters:
  • dataset (Dataset) – Dataset to make prediction on.

  • transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

save()[source]

Saves scikit-learn model to disk using joblib.

reload()[source]

Loads scikit-learn model from joblib file on disk.

Gradient Boosting Models

Gradient Boosting Models (LightGBM and XGBoost) can be wrapped so they can interact with DeepChem.

GBDTModel

class GBDTModel(model: BaseEstimator, model_dir: str | None = None, early_stopping_rounds: int = 50, eval_metric: Callable | str | None = None, **kwargs)[source]

Wrapper class that wraps GBDT models as DeepChem models.

This class supports LightGBM/XGBoost models.

__init__(model: BaseEstimator, model_dir: str | None = None, early_stopping_rounds: int = 50, eval_metric: Callable | str | None = None, **kwargs)[source]
Parameters:
  • model (BaseEstimator) – The model instance of scikit-learn wrapper LightGBM/XGBoost models.

  • model_dir (str, optional (default None)) – Path to directory where model will be stored.

  • early_stopping_rounds (int, optional (default 50)) – Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training.

  • eval_metric (Union[str, Callable]) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see official note for more details.

fit(dataset: Dataset)[source]

Fits GDBT model with all data.

First, this function splits all data into train and valid data (8:2), and finds the best n_estimators. And then, we retrain all data using best n_estimators * 1.25.

Parameters:

dataset (Dataset) – The Dataset to train this model on.

fit_with_eval(train_dataset: Dataset, valid_dataset: Dataset)[source]

Fits GDBT model with valid data.

Parameters:
  • train_dataset (Dataset) – The Dataset to train this model on.

  • valid_dataset (Dataset) – The Dataset to validate this model on.

Deep Learning Infrastructure

DeepChem maintains a lightweight layer of common deep learning model infrastructure that can be used for models built with different underlying frameworks. The losses and optimizers can be used for both TensorFlow and PyTorch models.

Losses

class Loss[source]

A loss function for use in training models.

class L1Loss[source]

The absolute difference between the true and predicted values.

class HuberLoss[source]

Modified version of L1 Loss, also known as Smooth L1 loss. Less sensitive to small errors, linear for larger errors. Huber loss is generally better for cases where are are both large outliers as well as small, as compared to the L1 loss. By default, Delta = 1.0 and reduction = ‘none’.

class L2Loss[source]

The squared difference between the true and predicted values.

class HingeLoss[source]

The hinge loss function.

The ‘output’ argument should contain logits, and all elements of ‘labels’ should equal 0 or 1.

class SquaredHingeLoss[source]

The Squared Hinge loss function.

Defined as the square of the hinge loss between y_true and y_pred. The Squared Hinge Loss is differentiable.

class PoissonLoss[source]

The Poisson loss function is defined as the mean of the elements of y_pred - (y_true * log(y_pred) for an input of (y_true, y_pred). Poisson loss is generally used for regression tasks where the data follows the poisson

class BinaryCrossEntropy[source]

The cross entropy between pairs of probabilities.

The arguments should each have shape (batch_size) or (batch_size, tasks) and contain probabilities.

class CategoricalCrossEntropy[source]

The cross entropy between two probability distributions.

The arguments should each have shape (batch_size, classes) or (batch_size, tasks, classes), and represent a probability distribution over classes.

class SigmoidCrossEntropy[source]

The cross entropy between pairs of probabilities.

The arguments should each have shape (batch_size) or (batch_size, tasks). The labels should be probabilities, while the outputs should be logits that are converted to probabilities using a sigmoid function.

class SoftmaxCrossEntropy[source]

The cross entropy between two probability distributions.

The arguments should each have shape (batch_size, classes) or (batch_size, tasks, classes). The labels should be probabilities, while the outputs should be logits that are converted to probabilities using a softmax function.

class SparseSoftmaxCrossEntropy[source]

The cross entropy between two probability distributions.

The labels should have shape (batch_size) or (batch_size, tasks), and be integer class labels. The outputs have shape (batch_size, classes) or (batch_size, tasks, classes) and be logits that are converted to probabilities using a softmax function.

class VAE_ELBO[source]

The Variational AutoEncoder loss, KL Divergence Regularize + marginal log-likelihood.

This losses based on _[1]. ELBO(Evidence lower bound) lexically replaced Variational lower bound. BCE means marginal log-likelihood, and KLD means KL divergence with normal distribution. Added hyper parameter ‘kl_scale’ for KLD.

The logvar and mu should have shape (batch_size, hidden_space). The x and reconstruction_x should have (batch_size, attribute). The kl_scale should be float.

Examples

Examples for calculating loss using constant tensor.

batch_size = 2, hidden_space = 2, num of original attribute = 3 >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> logvar = np.array([[1.0,1.3],[0.6,1.2]]) >>> mu = np.array([[0.2,0.7],[1.2,0.4]]) >>> x = np.array([[0.9,0.4,0.8],[0.3,0,1]]) >>> reconstruction_x = np.array([[0.8,0.3,0.7],[0.2,0,0.9]])

Case tensorflow >>> VAE_ELBO()._compute_tf_loss(tf.constant(logvar), tf.constant(mu), tf.constant(x), tf.constant(reconstruction_x)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.70165154, 0.76238271])>

Case pytorch >>> (VAE_ELBO()._create_pytorch_loss())(torch.tensor(logvar), torch.tensor(mu), torch.tensor(x), torch.tensor(reconstruction_x)) tensor([0.7017, 0.7624], dtype=torch.float64)

References

class VAE_KLDivergence[source]

The KL_divergence between hidden distribution and normal distribution.

This loss represents KL divergence losses between normal distribution(using parameter of distribution) based on _[1].

The logvar should have shape (batch_size, hidden_space) and each term represents standard deviation of hidden distribution. The mean shuold have (batch_size, hidden_space) and each term represents mean of hidden distribtuon.

Examples

Examples for calculating loss using constant tensor.

batch_size = 2, hidden_space = 2, >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> logvar = np.array([[1.0,1.3],[0.6,1.2]]) >>> mu = np.array([[0.2,0.7],[1.2,0.4]])

Case tensorflow >>> VAE_KLDivergence()._compute_tf_loss(tf.constant(logvar), tf.constant(mu)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.17381787, 0.51425203])>

Case pytorch >>> (VAE_KLDivergence()._create_pytorch_loss())(torch.tensor(logvar), torch.tensor(mu)) tensor([0.1738, 0.5143], dtype=torch.float64)

References

class ShannonEntropy[source]

The ShannonEntropy of discrete-distribution.

This loss represents shannon entropy based on _[1].

The inputs should have shape (batch size, num of variable) and represents probabilites distribution.

Examples

Examples for calculating loss using constant tensor.

batch_size = 2, num_of variable = variable, >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> inputs = np.array([[0.7,0.3],[0.9,0.1]])

Case tensorflow >>> ShannonEntropy()._compute_tf_loss(tf.constant(inputs)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.30543215, 0.16254149])>

Case pytorch >>> (ShannonEntropy()._create_pytorch_loss())(torch.tensor(inputs)) tensor([0.3054, 0.1625], dtype=torch.float64)

References

class GlobalMutualInformationLoss[source]

Global-global encoding loss (comparing two full graphs).

Compares the encodings of two molecular graphs and returns the loss between them based on the measure specified. The encodings are generated by two separate encoders in order to maximize the mutual information between the two encodings.

Parameters:
  • global_enc (torch.Tensor) – Features from a graph convolutional encoder.

  • global_enc2 (torch.Tensor) – Another set of features from a graph convolutional encoder.

  • measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.

  • average_loss (bool) – Whether to average the loss over the batch

Returns:

loss – Measure of mutual information between the encodings of the two graphs.

Return type:

torch.Tensor

References

Examples

>>> import numpy as np
>>> import deepchem.models.losses as losses
>>> from deepchem.feat.graph_data import BatchGraphData, GraphData
>>> from deepchem.models.torch_models.infograph import InfoGraphEncoder
>>> from deepchem.models.torch_models.layers import MultilayerPerceptron
>>> graph_list = []
>>> for i in range(3):
...     node_features = np.random.rand(5, 10)
...     edge_index = np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], dtype=np.int64)
...     edge_features = np.random.rand(5, 5)
...     graph_list.append(GraphData(node_features, edge_index, edge_features))
>>> batch = BatchGraphData(graph_list).numpy_to_torch()
>>> num_feat = 10
>>> edge_dim = 5
>>> dim = 4
>>> encoder = InfoGraphEncoder(num_feat, edge_dim, dim)
>>> encoding, feature_map = encoder(batch)
>>> g_enc = MultilayerPerceptron(2 * dim, dim)(encoding)
>>> g_enc2 = MultilayerPerceptron(2 * dim, dim)(encoding)
>>> globalloss = losses.GlobalMutualInformationLoss()
>>> loss = globalloss._create_pytorch_loss()(g_enc, g_enc2).detach().numpy()
class LocalMutualInformationLoss[source]

Local-global encoding loss (comparing a subgraph to the full graph).

Compares the encodings of two molecular graphs and returns the loss between them based on the measure specified. The encodings are generated by two separate encoders in order to maximize the mutual information between the two encodings.

Parameters:
  • local_enc (torch.Tensor) – Features from a graph convolutional encoder.

  • global_enc (torch.Tensor) – Another set of features from a graph convolutional encoder.

  • batch_graph_index (graph_index: np.ndarray or torch.tensor, dtype int) – This vector indicates which graph the node belongs with shape [num_nodes,]. Only present in BatchGraphData, not in GraphData objects.

  • measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.

  • average_loss (bool) – Whether to average the loss over the batch

Returns:

loss – Measure of mutual information between the encodings of the two graphs.

Return type:

torch.Tensor

References

Example

>>> import numpy as np
>>> import deepchem.models.losses as losses
>>> from deepchem.feat.graph_data import BatchGraphData, GraphData
>>> from deepchem.models.torch_models.infograph import InfoGraphEncoder
>>> from deepchem.models.torch_models.layers import MultilayerPerceptron
>>> graph_list = []
>>> for i in range(3):
...     node_features = np.random.rand(5, 10)
...     edge_index = np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], dtype=np.int64)
...     edge_features = np.random.rand(5, 5)
...     graph_list.append(GraphData(node_features, edge_index, edge_features))
>>> batch = BatchGraphData(graph_list).numpy_to_torch()
>>> num_feat = 10
>>> edge_dim = 5
>>> dim = 4
>>> encoder = InfoGraphEncoder(num_feat, edge_dim, dim)
>>> encoding, feature_map = encoder(batch)
>>> g_enc = MultilayerPerceptron(2 * dim, dim)(encoding)
>>> l_enc = MultilayerPerceptron(dim, dim)(feature_map)
>>> localloss = losses.LocalMutualInformationLoss()
>>> loss = localloss._create_pytorch_loss()(l_enc, g_enc, batch.graph_index).detach().numpy()
class GroverPretrainLoss[source]

The Grover Pretraining consists learning of atom embeddings and bond embeddings for a molecule. To this end, the learning consists of three tasks:

  1. Learning of atom vocabulary from atom embeddings and bond embeddings

  2. Learning of bond vocabulary from atom embeddings and bond embeddings

  3. Learning to predict functional groups from atom embedings readout and bond embeddings readout

The loss function accepts atom vocabulary labels, bond vocabulary labels and functional group predictions produced by Grover model during pretraining as a dictionary and applies negative log-likelihood loss for atom vocabulary and bond vocabulary predictions and Binary Cross Entropy loss for functional group prediction and sums these to get overall loss.

Example

>>> import torch
>>> from deepchem.models.losses import GroverPretrainLoss
>>> loss = GroverPretrainLoss()
>>> loss_fn = loss._create_pytorch_loss()
>>> batch_size = 3
>>> output_dim = 10
>>> fg_size = 8
>>> atom_vocab_task_target = torch.ones(batch_size).type(torch.int64)
>>> bond_vocab_task_target = torch.ones(batch_size).type(torch.int64)
>>> fg_task_target = torch.ones(batch_size, fg_size)
>>> atom_vocab_task_atom_pred = torch.zeros(batch_size, output_dim)
>>> bond_vocab_task_atom_pred = torch.zeros(batch_size, output_dim)
>>> atom_vocab_task_bond_pred = torch.zeros(batch_size, output_dim)
>>> bond_vocab_task_bond_pred = torch.zeros(batch_size, output_dim)
>>> fg_task_atom_from_atom = torch.zeros(batch_size, fg_size)
>>> fg_task_atom_from_bond = torch.zeros(batch_size, fg_size)
>>> fg_task_bond_from_atom = torch.zeros(batch_size, fg_size)
>>> fg_task_bond_from_bond = torch.zeros(batch_size, fg_size)
>>> result = loss_fn(atom_vocab_task_atom_pred, atom_vocab_task_bond_pred,
...     bond_vocab_task_atom_pred, bond_vocab_task_bond_pred, fg_task_atom_from_atom,
...     fg_task_atom_from_bond, fg_task_bond_from_atom, fg_task_bond_from_bond,
...     atom_vocab_task_target, bond_vocab_task_target, fg_task_target)

Reference

class EdgePredictionLoss[source]

EdgePredictionLoss is an unsupervised graph edge prediction loss function that calculates the loss based on the similarity between node embeddings for positive and negative edge pairs. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.

This loss function encourages the model to learn node embeddings that can effectively distinguish between true edges (positive samples) and false edges (negative samples) in the graph.

The loss is computed by comparing the similarity scores (dot product) of node embeddings for positive and negative edge pairs. The goal is to maximize the similarity for positive pairs and minimize it for negative pairs.

To use this loss function, the input must be a BatchGraphData object transformed by the negative_edge_sampler. The loss function takes the node embeddings and the input graph data (with positive and negative edge pairs) as inputs and returns the edge prediction loss.

Examples

>>> from deepchem.models.losses import EdgePredictionLoss
>>> from deepchem.feat.graph_data import BatchGraphData, GraphData
>>> from deepchem.models.torch_models.gnn import negative_edge_sampler
>>> import torch
>>> import numpy as np
>>> emb_dim = 8
>>> num_nodes_list, num_edge_list = [3, 4, 5], [2, 4, 5]
>>> num_node_features, num_edge_features = 32, 32
>>> edge_index_list = [
...     np.array([[0, 1], [1, 2]]),
...     np.array([[0, 1, 2, 3], [1, 2, 0, 2]]),
...     np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]]),
... ]
>>> graph_list = [
...     GraphData(node_features=np.random.random_sample(
...         (num_nodes_list[i], num_node_features)),
...               edge_index=edge_index_list[i],
...               edge_features=np.random.random_sample(
...                   (num_edge_list[i], num_edge_features)),
...               node_pos_features=None) for i in range(len(num_edge_list))
... ]
>>> batched_graph = BatchGraphData(graph_list)
>>> batched_graph = batched_graph.numpy_to_torch()
>>> neg_sampled = negative_edge_sampler(batched_graph)
>>> embedding = np.random.random((sum(num_nodes_list), emb_dim))
>>> embedding = torch.from_numpy(embedding)
>>> loss_func = EdgePredictionLoss()._create_pytorch_loss()
>>> loss = loss_func(embedding, neg_sampled)

References

class GraphNodeMaskingLoss[source]

GraphNodeMaskingLoss is an unsupervised graph node masking loss function that calculates the loss based on the predicted node labels and true node labels. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.

This loss function encourages the model to learn node embeddings that can effectively predict the masked node labels in the graph.

The loss is computed using the CrossEntropyLoss between the predicted node labels and the true node labels.

To use this loss function, the input must be a BatchGraphData object transformed by the mask_nodes function. The loss function takes the predicted node labels, predicted edge labels, and the input graph data (with masked node labels) as inputs and returns the node masking loss.

Parameters:
  • pred_node (torch.Tensor) – Predicted node labels

  • pred_edge (Optional(torch.Tensor)) – Predicted edge labels

  • inputs (BatchGraphData) – Input graph data with masked node and edge labels

Examples

>>> from deepchem.models.losses import GraphNodeMaskingLoss
>>> from deepchem.feat.graph_data import BatchGraphData, GraphData
>>> from deepchem.models.torch_models.gnn import mask_nodes
>>> import torch
>>> import numpy as np
>>> num_nodes_list, num_edge_list = [3, 4, 5], [2, 4, 5]
>>> num_node_features, num_edge_features = 32, 32
>>> edge_index_list = [
...     np.array([[0, 1], [1, 2]]),
...     np.array([[0, 1, 2, 3], [1, 2, 0, 2]]),
...     np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]]),
... ]
>>> graph_list = [
...     GraphData(node_features=np.random.random_sample(
...         (num_nodes_list[i], num_node_features)),
...               edge_index=edge_index_list[i],
...               edge_features=np.random.random_sample(
...                   (num_edge_list[i], num_edge_features)),
...               node_pos_features=None) for i in range(len(num_edge_list))
... ]
>>> batched_graph = BatchGraphData(graph_list)
>>> batched_graph = batched_graph.numpy_to_torch()
>>> masked_graph = mask_nodes(batched_graph, 0.1)
>>> pred_node = torch.randn((sum(num_nodes_list), num_node_features))
>>> pred_edge = torch.randn((sum(num_edge_list), num_edge_features))
>>> loss_func = GraphNodeMaskingLoss()._create_pytorch_loss()
>>> loss = loss_func(pred_node[masked_graph.masked_node_indices],
...                  pred_edge[masked_graph.connected_edge_indices], masked_graph)

References

class GraphEdgeMaskingLoss[source]

GraphEdgeMaskingLoss is an unsupervised graph edge masking loss function that calculates the loss based on the predicted edge labels and true edge labels. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.

This loss function encourages the model to learn node embeddings that can effectively predict the masked edge labels in the graph.

The loss is computed using the CrossEntropyLoss between the predicted edge labels and the true edge labels.

To use this loss function, the input must be a BatchGraphData object transformed by the mask_edges function. The loss function takes the predicted edge labels and the true edge labels as inputs and returns the edge masking loss.

Parameters:
  • pred_edge (torch.Tensor) – Predicted edge labels.

  • inputs (BatchGraphData) – Input graph data (with masked edge labels).

Examples

>>> from deepchem.models.losses import GraphEdgeMaskingLoss
>>> from deepchem.feat.graph_data import BatchGraphData, GraphData
>>> from deepchem.models.torch_models.gnn import mask_edges
>>> import torch
>>> import numpy as np
>>> num_nodes_list, num_edge_list = [3, 4, 5], [2, 4, 5]
>>> num_node_features, num_edge_features = 32, 32
>>> edge_index_list = [
...     np.array([[0, 1], [1, 2]]),
...     np.array([[0, 1, 2, 3], [1, 2, 0, 2]]),
...     np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]]),
... ]
>>> graph_list = [
...     GraphData(node_features=np.random.random_sample(
...         (num_nodes_list[i], num_node_features)),
...               edge_index=edge_index_list[i],
...               edge_features=np.random.random_sample(
...                   (num_edge_list[i], num_edge_features)),
...               node_pos_features=None) for i in range(len(num_edge_list))
... ]
>>> batched_graph = BatchGraphData(graph_list)
>>> batched_graph = batched_graph.numpy_to_torch()
>>> masked_graph = mask_edges(batched_graph, .1)
>>> pred_edge = torch.randn((sum(num_edge_list), num_edge_features))
>>> loss_func = GraphEdgeMaskingLoss()._create_pytorch_loss()
>>> loss = loss_func(pred_edge[masked_graph.masked_edge_idx], masked_graph)

References

class DeepGraphInfomaxLoss[source]

Loss that maximizes mutual information between local node representations and a pooled global graph representation. This is to encourage nearby nodes to have similar embeddings.

Parameters:
  • positive_score (torch.Tensor) – Positive score. This score measures the similarity between the local node embeddings (node_emb) and the global graph representation (positive_expanded_summary_emb) derived from the same graph. The goal is to maximize this score, as it indicates that the local node embeddings and the global graph representation are highly correlated, capturing the mutual information between them.

  • negative_score (torch.Tensor) – Negative score. This score measures the similarity between the local node embeddings (node_emb) and the global graph representation (negative_expanded_summary_emb) derived from a different graph (shifted by one position in this case). The goal is to minimize this score, as it indicates that the local node embeddings and the global graph representation from different graphs are not correlated, ensuring that the model learns meaningful representations that are specific to each graph.

Examples

>>> import torch
>>> import numpy as np
>>> from deepchem.feat.graph_data import GraphData
>>> from torch_geometric.nn import global_mean_pool
>>> from deepchem.models.losses import DeepGraphInfomaxLoss
>>> x = np.array([[1, 0], [0, 1], [1, 1], [0, 0]])
>>> edge_index = np.array([[0, 1, 2, 0, 3], [1, 0, 1, 3, 2]])
>>> graph_index = np.array([0, 0, 1, 1])
>>> data = GraphData(node_features=x, edge_index=edge_index, graph_index=graph_index).numpy_to_torch()
>>> graph_infomax_loss = DeepGraphInfomaxLoss()._create_pytorch_loss()
>>> # Initialize node_emb randomly
>>> num_nodes = data.num_nodes
>>> embedding_dim = 8
>>> node_emb = torch.randn(num_nodes, embedding_dim)
>>> # Compute the global graph representation
>>> summary_emb = global_mean_pool(node_emb, data.graph_index)
>>> # Compute positive and negative scores
>>> positive_score = torch.matmul(node_emb, summary_emb.t())
>>> negative_score = torch.matmul(node_emb, summary_emb.roll(1, dims=0).t())
>>> loss = graph_infomax_loss(positive_score, negative_score)

References

class GraphContextPredLoss[source]

GraphContextPredLoss is a loss function designed for graph neural networks that aims to predict the context of a node given its substructure. The context of a node is essentially the ring of nodes around it outside of an inner k1-hop diameter and inside an outer k2-hop diameter.

This loss compares the representation of a node’s neighborhood with the representation of the node’s context. It then uses negative sampling to compare the representation of the node’s neighborhood with the representation of a random node’s context.

Parameters:
  • mode (str) – The mode of the model. It can be either “cbow” (continuous bag of words) or “skipgram”.

  • neg_samples (int) – The number of negative samples to use for negative sampling.

Examples

>>> import torch
>>> from deepchem.models.losses import GraphContextPredLoss
>>> substruct_rep = torch.randn(4, 8)
>>> overlapped_node_rep = torch.randn(8, 8)
>>> context_rep = torch.randn(4, 8)
>>> neg_context_rep = torch.randn(2 * 4, 8)
>>> overlapped_context_size = torch.tensor([2, 2, 2, 2])
>>> mode = "cbow"
>>> neg_samples = 2
>>> graph_context_pred_loss = GraphContextPredLoss()._create_pytorch_loss(mode, neg_samples)
>>> loss = graph_context_pred_loss(substruct_rep, overlapped_node_rep, context_rep, neg_context_rep, overlapped_context_size)
class DensityProfileLoss[source]

Loss for the density profile entry type for Quantum Chemistry calculations. It is an integration of the squared difference between ground truth and calculated values, at all spaces in the integration grid.

Examples

>>> from deepchem.models.losses import DensityProfileLoss
>>> import torch
>>> volume = torch.Tensor([2.0])
>>> output = torch.Tensor([3.0])
>>> labels = torch.Tensor([4.0])
>>> loss = (DensityProfileLoss()._create_pytorch_loss(volume))(output, labels)
>>> # Generating volume tensor for an entry object:
>>> from deepchem.feat.dft_data import DFTEntry
>>> e_type = 'dens'
>>> true_val = 0
>>> systems =[{'moldesc': 'H 0.86625 0 0; F -0.86625 0 0','basis' : '6-311++G(3df,3pd)'}]
>>> dens_entry_for_HF = DFTEntry.create(e_type, true_val, systems)
>>> grid = (dens_entry_for_HF).get_integration_grid()

The 6-311++G(3df,3pd) basis for atomz 1 does not exist, but we will download it Downloaded to /usr/share/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/01.gaussian94 The 6-311++G(3df,3pd) basis for atomz 9 does not exist, but we will download it Downloaded to /usr/share/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/09.gaussian94

>>> volume = grid.get_dvolume()

References

Kasim, Muhammad F., and Sam M. Vinko. “Learning the exchange-correlation functional from nature with fully differentiable density functional theory.” Physical Review Letters 127.12 (2021): 126403. https://github.com/deepchem/deepchem/blob/0bc3139bb99ae7700ba2325a6756e33b6c327842/deepchem/models/dft/dftxc.py

class NTXentMultiplePositives(norm: bool = True, tau: float = 0.5, uniformity_reg=0, variance_reg=0, covariance_reg=0, conformer_variance_reg=0)[source]

This is a modification of the NTXent loss function from Chen [1]_. This loss is designed for contrastive learning of molecular representations, comparing the similarity of a molecule’s latent representation to positive and negative samples.

The modifications proposed in [2]_ enable multiple conformers to be used as positive samples.

This loss function is designed for graph neural networks and is particularly useful for unsupervised pre-training tasks.

Parameters:
  • norm (bool, optional (default=True)) – Whether to normalize the similarity matrix.

  • tau (float, optional (default=0.5)) – Temperature parameter for the similarity matrix.

  • uniformity_reg (float, optional (default=0)) – Regularization weight for the uniformity loss.

  • variance_reg (float, optional (default=0)) – Regularization weight for the variance loss.

  • covariance_reg (float, optional (default=0)) – Regularization weight for the covariance loss.

  • conformer_variance_reg (float, optional (default=0)) – Regularization weight for the conformer variance loss.

Examples

>>> import torch
>>> from deepchem.models.losses import NTXentMultiplePositives
>>> z1 = torch.randn(4, 8)
>>> z2 = torch.randn(4 * 3, 8)
>>> ntxent_loss = NTXentMultiplePositives(norm=True, tau=0.5)
>>> loss_fn = ntxent_loss._create_pytorch_loss()
>>> loss = loss_fn(z1, z2)

References

__init__(norm: bool = True, tau: float = 0.5, uniformity_reg=0, variance_reg=0, covariance_reg=0, conformer_variance_reg=0) None[source]

Optimizers

class Optimizer(learning_rate: float | LearningRateSchedule)[source]

An algorithm for optimizing a model.

This is an abstract class. Subclasses represent specific optimization algorithms.

__init__(learning_rate: float | LearningRateSchedule)[source]

This constructor should only be called by subclasses.

Parameters:

learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization

class LearningRateSchedule[source]

A schedule for changing the learning rate over the course of optimization.

This is an abstract class. Subclasses represent specific schedules.

class AdaGrad(learning_rate: float | LearningRateSchedule = 0.001, initial_accumulator_value: float = 0.1, epsilon: float = 1e-07)[source]

The AdaGrad optimization algorithm.

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates. See [1]_ for a full reference for the algorithm.

References

__init__(learning_rate: float | LearningRateSchedule = 0.001, initial_accumulator_value: float = 0.1, epsilon: float = 1e-07)[source]

Construct an AdaGrad optimizer. :param learning_rate: the learning rate to use for optimization :type learning_rate: float or LearningRateSchedule :param initial_accumulator_value: a parameter of the AdaGrad algorithm :type initial_accumulator_value: float :param epsilon: a parameter of the AdaGrad algorithm :type epsilon: float

class Adam(learning_rate: float | LearningRateSchedule = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, weight_decay: float = 0)[source]

The Adam optimization algorithm.

__init__(learning_rate: float | LearningRateSchedule = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, weight_decay: float = 0)[source]

Construct an Adam optimizer.

Parameters:
  • learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization

  • beta1 (float) – a parameter of the Adam algorithm

  • beta2 (float) – a parameter of the Adam algorithm

  • epsilon (float) – a parameter of the Adam algorithm

  • weight_decay (float) – L2 penalty - a parameter of the Adam algorithm

class AdamW(learning_rate: float | LearningRateSchedule = 0.001, weight_decay: float | LearningRateSchedule = 0.01, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, amsgrad: bool = False)[source]

The AdamW optimization algorithm. AdamW is a variant of Adam, with improved weight decay. In Adam, weight decay is implemented as: weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) In AdamW, weight decay is implemented as: weight_decay (float, optional) – weight decay coefficient (default: 1e-2)

__init__(learning_rate: float | LearningRateSchedule = 0.001, weight_decay: float | LearningRateSchedule = 0.01, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, amsgrad: bool = False)[source]

Construct an AdamW optimizer. :param learning_rate: the learning rate to use for optimization :type learning_rate: float or LearningRateSchedule :param weight_decay: weight decay coefficient for AdamW :type weight_decay: float or LearningRateSchedule :param beta1: a parameter of the Adam algorithm :type beta1: float :param beta2: a parameter of the Adam algorithm :type beta2: float :param epsilon: a parameter of the Adam algorithm :type epsilon: float :param amsgrad: If True, will use the AMSGrad variant of AdamW (from “On the Convergence of Adam and Beyond”), else will use the original algorithm. :type amsgrad: bool

class SparseAdam(learning_rate: float | LearningRateSchedule = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08)[source]

The Sparse Adam optimization algorithm, also known as Lazy Adam. Sparse Adam is suitable for sparse tensors. It handles sparse updates more efficiently. It only updates moving-average accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices.

__init__(learning_rate: float | LearningRateSchedule = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08)[source]

Construct an Adam optimizer.

Parameters:
  • learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization

  • beta1 (float) – a parameter of the SparseAdam algorithm

  • beta2 (float) – a parameter of the SparseAdam algorithm

  • epsilon (float) – a parameter of the SparseAdam algorithm

class RMSProp(learning_rate: float | LearningRateSchedule = 0.001, momentum: float = 0.0, decay: float = 0.9, epsilon: float = 1e-10)[source]

RMSProp Optimization algorithm.

__init__(learning_rate: float | LearningRateSchedule = 0.001, momentum: float = 0.0, decay: float = 0.9, epsilon: float = 1e-10)[source]

Construct an RMSProp Optimizer.

Parameters:
  • learning_rate (float or LearningRateSchedule) – the learning_rate used for optimization

  • momentum (float, default 0.0) – a parameter of the RMSProp algorithm

  • decay (float, default 0.9) – a parameter of the RMSProp algorithm

  • epsilon (float, default 1e-10) – a parameter of the RMSProp algorithm

class GradientDescent(learning_rate: float | LearningRateSchedule = 0.001)[source]

The gradient descent optimization algorithm.

__init__(learning_rate: float | LearningRateSchedule = 0.001)[source]

Construct a gradient descent optimizer.

Parameters:

learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization

class ExponentialDecay(initial_rate: float, decay_rate: float, decay_steps: int, staircase: bool = True)[source]

A learning rate that decreases exponentially with the number of training steps.

__init__(initial_rate: float, decay_rate: float, decay_steps: int, staircase: bool = True)[source]

Create an exponentially decaying learning rate.

The learning rate starts as initial_rate. Every decay_steps training steps, it is multiplied by decay_rate.

Parameters:
  • initial_rate (float) – the initial learning rate

  • decay_rate (float) – the base of the exponential

  • decay_steps (int) – the number of training steps over which the rate decreases by decay_rate

  • staircase (bool) – if True, the learning rate decreases by discrete jumps every decay_steps. if False, the learning rate decreases smoothly every step

class PolynomialDecay(initial_rate: float, final_rate: float, decay_steps: int, power: float = 1.0)[source]

A learning rate that decreases from an initial value to a final value over a fixed number of training steps.

__init__(initial_rate: float, final_rate: float, decay_steps: int, power: float = 1.0)[source]

Create a smoothly decaying learning rate.

The learning rate starts as initial_rate. It smoothly decreases to final_rate over decay_steps training steps. It decays as a function of (1-step/decay_steps)**power. Once the final rate is reached, it remains there for the rest of optimization.

Parameters:
  • initial_rate (float) – the initial learning rate

  • final_rate (float) – the final learning rate

  • decay_steps (int) – the number of training steps over which the rate decreases from initial_rate to final_rate

  • power (float) – the exponent controlling the shape of the decay

class LinearCosineDecay(initial_rate: float, decay_steps: int, alpha: float = 0.0, beta: float = 0.001, num_periods: float = 0.5)[source]

Applies linear cosine decay to the learning rate

__init__(initial_rate: float, decay_steps: int, alpha: float = 0.0, beta: float = 0.001, num_periods: float = 0.5)[source]
Parameters:
  • learning_rate (float) –

  • rate (initial learning) –

  • decay_steps (int) –

  • over (number of steps to decay) –

  • num_periods (number of periods in the cosine part of the decay) –

class LambdaLRWithWarmup(initial_rate: float, num_warmup_steps: int, num_training_steps: int | None = None, warmup_type: str = 'linear')[source]

A learning rate scheduler supporting warmup followed by cool down.

Example

>>> import torch
>>> opt = Adam(learning_rate=5e-5)
>>> lr_schedule = LambdaLRWithWarmup(initial_rate=5e-5,
...     num_training_steps=100, num_warmup_steps=10)
>>> params = [torch.nn.Parameter(torch.Tensor([1.0]))]
>>> optimizer = opt._create_pytorch_optimizer(params)
>>> scheduler = lr_schedule._create_pytorch_schedule(optimizer)
__init__(initial_rate: float, num_warmup_steps: int, num_training_steps: int | None = None, warmup_type: str = 'linear')[source]
Parameters:
  • initial_rate (float) – Initial learning rate

  • num_warmup_steps (int) – Number of warmup steps

  • num_training_steps (int) – Number of training steps - required for linear schedule.

  • warmup_type (str, optional. default: linear) –

    When linear, creates a learning rate schedule that decreases linearly from

    the initial lr in the optimizer to 0.

    When constant, creates a constant learning rate preceded by a warmup period

    during which the learning rate increases linearly between 0 and the initial lr set in the optimizer.

Keras Models

DeepChem extensively uses Keras to build deep learning models.

KerasModel

Training loss and validation metrics can be automatically logged to Weights & Biases with the following commands:

# Install wandb in shell
pip install wandb

# Login in shell (required only once)
wandb login
# Login in notebook (required only once)
import wandb
wandb.login()

# Initialize a WandbLogger
logger = WandbLogger(…)

# Set `wandb_logger` when creating `KerasModel`
import deepchem as dc
# Log training loss to wandb
model = dc.models.KerasModel(…, wandb_logger=logger)
model.fit(…)

# Log validation metrics to wandb using ValidationCallback
import deepchem as dc
vc = dc.models.ValidationCallback(…)
model = KerasModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()
class KerasModel(model: Model, loss: Loss | Callable[[List, List, List], Any], output_types: List[str] | None = None, batch_size: int = 100, model_dir: str | None = None, learning_rate: float | LearningRateSchedule = 0.001, optimizer: Optimizer | None = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, wandb_logger: WandbLogger | None = None, **kwargs)[source]

This is a DeepChem model implemented by a Keras model.

This class provides several advantages over using the Keras model’s fitting and prediction methods directly.

  1. It provides better integration with the rest of DeepChem,

    such as direct support for Datasets and Transformers.

  2. It defines the loss in a more flexible way. In particular,

    Keras does not support multidimensional weight matrices, which makes it impossible to implement most multitask models with Keras.

  3. It provides various additional features not found in the

    Keras model class, such as uncertainty prediction and saliency mapping.

Here is a simple example of code that uses KerasModel to train a Keras model on a DeepChem dataset.

>> keras_model = tf.keras.Sequential([ >> tf.keras.layers.Dense(1000, activation=’tanh’), >> tf.keras.layers.Dense(1) >> ]) >> model = KerasModel(keras_model, loss=dc.models.losses.L2Loss()) >> model.fit(dataset)

The loss function for a model can be defined in two different ways. For models that have only a single output and use a standard loss function, you can simply provide a dc.models.losses.Loss object. This defines the loss for each sample or sample/task pair. The result is automatically multiplied by the weights and averaged over the batch. Any additional losses computed by model layers, such as weight decay penalties, are also added.

For more complicated cases, you can instead provide a function that directly computes the total loss. It must be of the form f(outputs, labels, weights), taking the list of outputs from the model, the expected values, and any weight matrices. It should return a scalar equal to the value of the loss function for the batch. No additional processing is done to the result; it is up to you to do any weighting, averaging, adding of penalty terms, etc.

You can optionally provide an output_types argument, which describes how to interpret the model’s outputs. This should be a list of strings, one for each output. You can use an arbitrary output_type for a output, but some output_types are special and will undergo extra processing:

  • ‘prediction’: This is a normal output, and will be returned by predict().

    If output types are not specified, all outputs are assumed to be of this type.

  • ‘loss’: This output will be used in place of the normal

    outputs for computing the loss function. For example, models that output probability distributions usually do it by computing unbounded numbers (the logits), then passing them through a softmax function to turn them into probabilities. When computing the cross entropy, it is more numerically stable to use the logits directly rather than the probabilities. You can do this by having the model produce both probabilities and logits as outputs, then specifying output_types=[‘prediction’, ‘loss’]. When predict() is called, only the first output (the probabilities) will be returned. But during training, it is the second output (the logits) that will be passed to the loss function.

  • ‘variance’: This output is used for estimating the

    uncertainty in another output. To create a model that can estimate uncertainty, there must be the same number of ‘prediction’ and ‘variance’ outputs. Each variance output must have the same shape as the corresponding prediction output, and each element is an estimate of the variance in the corresponding prediction. Also be aware that if a model supports uncertainty, it MUST use dropout on every layer, and dropout most be enabled during uncertainty prediction. Otherwise, the uncertainties it computes will be inaccurate.

  • other: Arbitrary output_types can be used to extract outputs

    produced by the model, but will have no additional processing performed.

__init__(model: Model, loss: Loss | Callable[[List, List, List], Any], output_types: List[str] | None = None, batch_size: int = 100, model_dir: str | None = None, learning_rate: float | LearningRateSchedule = 0.001, optimizer: Optimizer | None = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, wandb_logger: WandbLogger | None = None, **kwargs) None[source]

Create a new KerasModel.

Parameters:
  • model (tf.keras.Model) – the Keras model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings) – the type of each output from the model, as described above

  • batch_size (int) – default batch size for training and evaluating

  • model_dir (str) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool) – whether to log progress to TensorBoard during training

  • wandb (bool) – whether to log progress to Weights & Biases during training (deprecated)

  • log_frequency (int) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

fit(dataset: Dataset, nb_epoch: int = 10, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, deterministic: bool = False, restore: bool = False, variables: List[Variable] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on

  • nb_epoch (int) – the number of epochs to train for

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Returns:

The average loss over the most recent checkpoint interval

Return type:

float

fit_generator(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: List[Variable] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on data from a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step, **kwargs) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Returns:

The average loss over the most recent checkpoint interval

Return type:

float

fit_on_batch(X: Sequence, y: Sequence, w: Sequence, variables: List[Variable] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], checkpoint: bool = True, max_checkpoints_to_keep: int = 5) float[source]

Perform a single step of training.

Parameters:
  • X (ndarray) – the inputs for the batch

  • y (ndarray) – the labels for the batch

  • w (ndarray) – the weights for the batch

  • variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • checkpoint (bool) – if true, save a checkpoint after performing the training step

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

Returns:

the loss on the batch

Return type:

float

predict_on_generator(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer] = [], outputs: Tensor | Sequence[Tensor] | None = None, output_types: str | Sequence[str] | None = None) ndarray | Sequence[ndarray][source]
Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned. If outputs is specified, output_types must be None.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

Returns:

a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

Return type:

OneOrMany[np.ndarray]

predict_on_batch(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], transformers: List[Transformer] = [], outputs: Tensor | Sequence[Tensor] | None = None) ndarray | Sequence[ndarray][source]

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned.

Returns:

a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

Return type:

OneOrMany[np.ndarray]

predict_uncertainty_on_batch(X: Sequence, masks: int = 50) Tuple[ndarray, ndarray] | Sequence[Tuple[ndarray, ndarray]][source]

Predict the model’s outputs, along with the uncertainty in each one.

The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.

  • masks (int) – the number of dropout masks to average over

Returns:

  • OneOrMany[Tuple[y_pred, y_std]]

  • y_pred (np.ndarray) – predicted value of the output

  • y_std (np.ndarray) – standard deviation of the corresponding element of y_pred

predict(dataset: Dataset, transformers: List[Transformer] = [], outputs: Tensor | Sequence[Tensor] | None = None, output_types: List[str] | None = None) ndarray | Sequence[ndarray][source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

predict_embedding(dataset: Dataset) ndarray | Sequence[ndarray][source]

Predicts embeddings created by underlying model if any exist. An embedding must be specified to have output_type of ‘embedding’ in the model definition.

Parameters:

dataset (dc.data.Dataset) – Dataset to make prediction on

Returns:

  • a NumPy array of the embeddings model produces, or a list

  • of arrays if it produces multiple embeddings

predict_uncertainty(dataset: Dataset, masks: int = 50) Tuple[ndarray, ndarray] | Sequence[Tuple[ndarray, ndarray]][source]

Predict the model’s outputs, along with the uncertainty in each one.

The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on

  • masks (int) – the number of dropout masks to average over

Returns:

  • for each output, a tuple (y_pred, y_std) where y_pred is the predicted

  • value of the output, and each element of y_std estimates the standard

  • deviation of the corresponding element of y_pred

evaluate_generator(generator: Iterable[Tuple[Any, Any, Any]], metrics: List[Metric], transformers: List[Transformer] = [], per_task_metrics: bool = False)[source]

Evaluate the performance of this model on the data produced by a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • metric (list of deepchem.metrics.Metric) – Evaluation metric

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • per_task_metrics (bool) – If True, return per-task scores.

Returns:

Maps tasks to scores under metric.

Return type:

dict

compute_saliency(X: ndarray) ndarray | Sequence[ndarray][source]

Compute the saliency map for an input sample.

This computes the Jacobian matrix with the derivative of each output element with respect to each input element. More precisely,

  • If this model has a single output, it returns a matrix of shape

    (output_shape, input_shape) with the derivatives.

  • If this model has multiple outputs, it returns a list of matrices, one

    for each output.

This method cannot be used on models that take multiple inputs.

Parameters:

X (ndarray) – the input data for a single sample

Return type:

the Jacobian matrix, or a list of matrices

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

save_checkpoint(max_checkpoints_to_keep: int = 5, model_dir: str | None = None) None[source]

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir

get_checkpoints(model_dir: str | None = None)[source]

Get a list of all available checkpoint files.

Parameters:

model_dir (str, default None) – Directory to get list of checkpoints from. Reverts to self.model_dir if None

restore(checkpoint: str | None = None, model_dir: str | None = None) None[source]

Reload the values of all variables from a checkpoint file.

Parameters:
  • checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.

  • model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir.

get_global_step() int[source]

Get the number of steps of fitting that have been performed.

load_from_pretrained(source_model: KerasModel, assignment_map: Dict[Any, Any] | None = None, value_map: Dict[Any, Any] | None = None, checkpoint: str | None = None, model_dir: str | None = None, include_top: bool = True, inputs: Sequence[Any] | None = None, **kwargs) None[source]

Copies variable values from a pretrained model. source_model can either be a pretrained model or a model with the same architecture. value_map is a variable-value dictionary. If no value_map is provided, the variable values are restored to the source_model from a checkpoint and a default value_map is created. assignment_map is a dictionary mapping variables from the source_model to the current model. If no assignment_map is provided, one is made from scratch and assumes the model is composed of several different layers, with the final one being a dense layer. include_top is used to control whether or not the final dense layer is used. The default assignment map is useful in cases where the type of task is different (classification vs regression) and/or number of tasks in the setting.

Parameters:
  • source_model (dc.KerasModel, required) – source_model can either be the pretrained model or a dc.KerasModel with the same architecture as the pretrained model. It is used to restore from a checkpoint, if value_map is None and to create a default assignment map if assignment_map is None

  • assignment_map (Dict, default None) – Dictionary mapping the source_model variables and current model variables

  • value_map (Dict, default None) – Dictionary containing source_model trainable variables mapped to numpy arrays. If value_map is None, the values are restored and a default variable map is created using the restored values

  • checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints

  • model_dir (str, default None) – Restore model from custom model directory if needed

  • include_top (bool, default True) – if True, copies the weights and bias associated with the final dense layer. Used only when assignment map is None

  • inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self. This option is useful only for models that are built by subclassing tf.keras.Model, and not using the functional API by tf.keras

TensorflowMultitaskIRVClassifier

class TensorflowMultitaskIRVClassifier(*args, **kwargs)[source]
__init__(*args, **kwargs)[source]

Initialize MultitaskIRVClassifier

Parameters:
  • n_tasks (int) – Number of tasks

  • K (int) – Number of nearest neighbours used in classification

  • penalty (float) – Amount of penalty (l2 or l1 applied)

RobustMultitaskClassifier

class RobustMultitaskClassifier(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_classes=2, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Implements a neural network for robust multitasking.

The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility toroute around challenges in multitasking with destructive interference.

References

This technique was introduced in [1]_

__init__(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_classes=2, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Create a RobustMultitaskClassifier.

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • n_classes (int) – the number of classes

  • bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.

  • bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers. same requirements as weight_init_stddevs

  • bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers same requirements as bias_init_consts

  • bypass_dropouts (list or float) – the dropout probablity to use for bypass layers. same requirements as dropouts

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

RobustMultitaskRegressor

class RobustMultitaskRegressor(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Implements a neural network for robust multitasking.

The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility to route around challenges in multitasking with destructive interference.

References

__init__(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Create a RobustMultitaskRegressor.

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.

  • bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers. same requirements as weight_init_stddevs

  • bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers same requirements as bias_init_consts

  • bypass_dropouts (list or float) – the dropout probablity to use for bypass layers. same requirements as dropouts

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

ProgressiveMultitaskClassifier

class ProgressiveMultitaskClassifier(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, **kwargs)[source]

Implements a progressive multitask neural network for classification.

Progressive Networks: https://arxiv.org/pdf/1606.04671v3.pdf

Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.

__init__(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, **kwargs)[source]

Creates a progressive network.

Only listing parameters specific to progressive networks here.

Parameters:
  • n_tasks (int) – Number of tasks

  • n_features (int) – Number of input features

  • alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

ProgressiveMultitaskRegressor

class ProgressiveMultitaskRegressor(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_outputs=1, **kwargs)[source]

Implements a progressive multitask neural network for regression.

Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.

References

See [1]_ for a full description of the progressive architecture

__init__(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_outputs=1, **kwargs)[source]

Creates a progressive network.

Only listing parameters specific to progressive networks here.

Parameters:
  • n_tasks (int) – Number of tasks

  • n_features (int) – Number of input features

  • alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

add_adapter(all_layers, task, layer_num)[source]

Add an adapter connection for given task/layer combo

fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, **kwargs)[source]

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on

  • nb_epoch (int) – the number of epochs to train for

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Returns:

The average loss over the most recent checkpoint interval

Return type:

float

fit_task(dataset, task, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, **kwargs)[source]

Fit one task.

WeaveModel

class WeaveModel(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn: ~typing.Callable | str | None = <function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]

Implements Google-style Weave Graph Convolutions

This model implements the Weave style graph convolutions from [1]_.

The biggest difference between WeaveModel style convolutions and GraphConvModel style convolutions is that Weave convolutions model bond features explicitly. This has the side effect that it needs to construct a NxN matrix explicitly to model bond interactions. This may cause scaling issues, but may possibly allow for better modeling of subtle bond effects.

Note that [1]_ introduces a whole variety of different architectures for Weave models. The default settings in this class correspond to the W2N2 variant from [1]_ which is the most commonly used variant..

Examples

Here’s an example of how to fit a WeaveModel on a tiny sample dataset.

>>> import numpy as np
>>> import deepchem as dc
>>> featurizer = dc.feat.WeaveFeaturizer()
>>> X = featurizer(["C", "CC"])
>>> y = np.array([1, 0])
>>> dataset = dc.data.NumpyDataset(X, y)
>>> model = dc.models.torch_models.WeaveModel(n_tasks=1, n_weave=2, fully_connected_layer_sizes=[2000, 1000], mode="classification")
>>> loss = model.fit(dataset)

References

__init__(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn: ~typing.Callable | str | None = <function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78 if chirality is used by WeaveFeaturizer.

  • n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.

  • n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer

  • n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)

  • n_weave (int, optional (default 2)) – The number of weave layers in this model.

  • fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of this list determines the number of layers.

  • conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight initialization of each convolutional layer. The length of this lisst should equal n_weave. Alternatively, this may be a single value instead of a list, in which case the same value is used for each layer.

  • weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight initialization of each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float (default 0.0)) – The magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str (default "l2")) – The type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • final_conv_activation_fn (Optional[ActivationFn] (default F.tanh)) – The activation funcntion to apply to the final convolution at the end of the weave convolutions. If None, then no activate is applied (hence linear).

  • activation_fns (str (default relu)) – The activation function to apply to each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional and fully connected layers.

  • gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram

  • compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input.

  • mode (str (default "classification")) – Either “classification” or “regression” for type of model.

  • n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)

  • batch_size (int (default 100)) – Batch size used by this model for training.

compute_features_on_batch(X_b)[source]

Compute tensors that will be input into the model from featurized representation.

The featurized input to WeaveModel is instances of WeaveMol created by WeaveFeaturizer. This method converts input WeaveMol objects into tensors used by the Keras implementation to compute WeaveModel outputs.

Parameters:

X_b (np.ndarray) – A numpy array with dtype=object where elements are WeaveMol objects.

Returns:

  • atom_feat (np.ndarray) – Of shape (N_atoms, N_atom_feat).

  • pair_feat (np.ndarray) – Of shape (N_pairs, N_pair_feat). Note that N_pairs will depend on the number of pairs being considered. If max_pair_distance is None, then this will be N_atoms**2. Else it will be the number of pairs within the specifed graph distance.

  • pair_split (np.ndarray) – Of shape (N_pairs,). The i-th entry in this array will tell you the originating atom for this pair (the “source”). Note that pairs are symmetric so for a pair (a, b), both a and b will separately be sources at different points in this array.

  • atom_split (np.ndarray) – Of shape (N_atoms,). The i-th entry in this array will be the molecule with the i-th atom belongs to.

  • atom_to_pair (np.ndarray) – Of shape (N_pairs, 2). The i-th row in this array will be the array [a, b] if (a, b) is a pair to be considered. (Note by symmetry, this implies some other row will contain [b, a].

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Convert a dataset into the tensors needed for learning.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to convert

  • epochs (int, optional (Default 1)) – Number of times to walk over dataset

  • mode (str, optional (Default 'fit')) – Ignored in this implementation.

  • deterministic (bool, optional (Default True)) – Whether the dataset should be walked in a deterministic fashion

  • pad_batches (bool, optional (Default True)) – If true, each returned batch will have size self.batch_size.

Return type:

Iterator which walks over the batches

DTNNModel

class DTNNModel(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2, **kwargs)[source]

Implements DTNN models for regression.

DTNN is based on the many-body Hamiltonian concept, which is a fundamental principle in quantum mechanics. DTNN recieves a molecule’s distance matrix and membership of its atom from its Coulomb Matrix representation. Then, it iteratively refines the representation of each atom by considering its interactions with neighboring atoms. Finally, it predicts the energy of the molecule by summing up the energies of the individual atoms.

This class implements the Deep Tensor Neural Network (DTNN) [1]_.

Examples

>>> import os
>>> from deepchem.data import SDFLoader
>>> from deepchem.feat import CoulombMatrix
>>> from deepchem.models.torch_models import DTNNModel
>>> model_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
>>> dataset_file = os.path.join(model_dir, 'tests/assets/qm9_mini.sdf')
>>> TASKS = ["alpha", "homo"]
>>> loader = SDFLoader(tasks=TASKS, featurizer=CoulombMatrix(29), sanitize=True)
>>> data = loader.create_dataset(dataset_file, shard_size=100)
>>> n_tasks = data.y.shape[1]
>>> model = DTNNModel(n_tasks,
...                   n_embedding=20,
...                   n_distance=100,
...                   learning_rate=1.0,
...                   mode="regression")
>>> loss = model.fit(data, nb_epoch=250)
>>> pred = model.predict(data)

References

__init__(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_embedding (int (default 30)) – Number of features per atom.

  • n_hidden (int (default 100)) – Number of features for each molecule after DTNNStep

  • n_distance (int (default 100)) – granularity of distance matrix step size will be (distance_max-distance_min)/n_distance

  • distance_min (float (default -1)) – minimum distance of atom pairs (in Angstrom)

  • distance_max (float (default = 18)) – maximum distance of atom pairs (in Angstrom)

  • output_activation (bool (default True)) – determines whether an activation function should be apply to its output.

  • mode (str (default "regression")) – Only “regression” is currently supported.

  • dropout (float (default 0.0)) – the dropout probablity to use.

  • n_steps (int (default 2)) – Number of DTNNStep Layers to use.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True)[source]

Create a generator that iterates batches for a dataset. It processes inputs through the _compute_features_on_batch function to calculate required features of input.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

DAGModel

class DAGModel(n_tasks, max_atoms=50, n_atom_feat=75, n_graph_feat=30, n_outputs=30, layer_sizes=[100], layer_sizes_gather=[100], dropout=None, mode='classification', n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]

Directed Acyclic Graph models for molecular property prediction.

This model is based on the following paper:

Lusci, Alessandro, Gianluca Pollastri, and Pierre Baldi. “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.” Journal of chemical information and modeling 53.7 (2013): 1563-1575.

The basic idea for this paper is that a molecule is usually viewed as an undirected graph. However, you can convert it to a series of directed graphs. The idea is that for each atom, you make a DAG using that atom as the vertex of the DAG and edges pointing “inwards” to it. This transformation is implemented in dc.trans.transformers.DAGTransformer.UG_to_DAG.

This model accepts ConvMols as input, just as GraphConvModel does, but these ConvMol objects must be transformed by dc.trans.DAGTransformer.

As a note, performance of this model can be a little sensitive to initialization. It might be worth training a few different instantiations to get a stable set of parameters.

__init__(n_tasks, max_atoms=50, n_atom_feat=75, n_graph_feat=30, n_outputs=30, layer_sizes=[100], layer_sizes_gather=[100], dropout=None, mode='classification', n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • max_atoms (int, optional) – Maximum number of atoms in a molecule, should be defined based on dataset.

  • n_atom_feat (int, optional) – Number of features per atom.

  • n_graph_feat (int, optional) – Number of features for atom in the graph.

  • n_outputs (int, optional) – Number of features for each molecule.

  • layer_sizes (list of int, optional) – List of hidden layer size(s) in the propagation step: length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.

  • layer_sizes_gather (list of int, optional) – List of hidden layer size(s) in the gather step.

  • dropout (None or float, optional) – Dropout probability, applied after each propagation step and gather step.

  • mode (str, optional) – Either “classification” or “regression” for type of model.

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Convert a dataset into the tensors needed for learning

GraphConvModel

class GraphConvModel(n_tasks: int, graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]

Graph Convolutional Models.

This class implements the graph convolutional model from the following paper [1]_. These graph convolutions start with a per-atom set of descriptors for each atom in a molecule, then combine and recombine these descriptors over convolutional layers. following [1]_.

References

__init__(n_tasks: int, graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]

The wrapper class for graph convolutions.

Note that since the underlying _GraphConvKerasModel class is specified using imperative subclassing style, this model cannout make predictions for arbitrary outputs.

Parameters:
  • n_tasks (int) – Number of tasks

  • graph_conv_layers (list of int) – Width of channels for the Graph Convolution Layers

  • dense_layer_size (int) – Width of channels for Atom Level Dense Layer after GraphPool

  • dropout (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(graph_conv_layers)+1 (one value for each convolution layer, and one for the dense layer). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • mode (str) – Either “classification” or “regression”

  • number_atom_features (int) – 75 is the default number of atom features created, but this can vary if various options are passed to the function atom_features in graph_features

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • batch_normalize (True) – if True, apply batch normalization to model

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

MPNNModel

class MPNNModel(n_tasks, n_atom_feat=70, n_pair_feat=8, n_hidden=100, T=5, M=10, mode='regression', dropout=0.0, n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]

Message Passing Neural Network,

Message Passing Neural Networks [1]_ treat graph convolutional operations as an instantiation of a more general message passing schem. Recall that message passing in a graph is when nodes in a graph send each other “messages” and update their internal state as a consequence of these messages.

Ordering structures in this model are built according to [2]_

References

__init__(n_tasks, n_atom_feat=70, n_pair_feat=8, n_hidden=100, T=5, M=10, mode='regression', dropout=0.0, n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_atom_feat (int, optional) – Number of features per atom.

  • n_pair_feat (int, optional) – Number of features per pair of atoms.

  • n_hidden (int, optional) – Number of units(convolution depths) in corresponding hidden layer

  • n_graph_feat (int, optional) – Number of output features for each molecule(graph)

  • dropout (float) – the dropout probablity to use.

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

BasicMolGANModel

class BasicMolGANModel(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, **kwargs)[source]

Model for de-novo generation of small molecules based on work of Nicola De Cao et al. [1]_. It uses a GAN directly on graph data and a reinforcement learning objective to induce the network to generate molecules with certain chemical properties. Utilizes WGAN infrastructure; uses adjacency matrix and node features as inputs. Inputs need to be one-hot representation.

Examples

>>>
>> import deepchem as dc
>> from deepchem.models import BasicMolGANModel as MolGAN
>> from deepchem.models.optimizers import ExponentialDecay
>> from tensorflow import one_hot
>> smiles = ['CCC', 'C1=CC=CC=C1', 'CNC' ]
>> # create featurizer
>> feat = dc.feat.MolGanFeaturizer()
>> # featurize molecules
>> features = feat.featurize(smiles)
>> # Remove empty objects
>> features = list(filter(lambda x: x is not None, features))
>> # create model
>> gan = MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000))
>> dataset = dc.data.NumpyDataset([x.adjacency_matrix for x in features],[x.node_features for x in features])
>> def iterbatches(epochs):
>>     for i in range(epochs):
>>         for batch in dataset.iterbatches(batch_size=gan.batch_size, pad_batches=True):
>>             adjacency_tensor = one_hot(batch[0], gan.edges)
>>             node_tensor = one_hot(batch[1], gan.nodes)
>>             yield {gan.data_inputs[0]: adjacency_tensor, gan.data_inputs[1]:node_tensor}
>> gan.fit_gan(iterbatches(8), generator_steps=0.2, checkpoint_interval=5000)
>> generated_data = gan.predict_gan_generator(1000)
>> # convert graphs to RDKitmolecules
>> nmols = feat.defeaturize(generated_data)
>> print("{} molecules generated".format(len(nmols)))
>> # remove invalid moles
>> nmols = list(filter(lambda x: x is not None, nmols))
>> # currently training is unstable so 0 is a common outcome
>> print ("{} valid molecules".format(len(nmols)))

References

__init__(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, **kwargs)[source]

Initialize the model

Parameters:
  • edges (int, default 5) – Number of bond types includes BondType.Zero

  • vertices (int, default 9) – Max number of atoms in adjacency and node features matrices

  • nodes (int, default 5) – Number of atom types in node features matrix

  • embedding_dim (int, default 10) – Size of noise input array

  • dropout_rate (float, default = 0.) – Rate of dropout used across whole model

  • name (str, default '') – Name of the model

get_noise_input_shape() Tuple[int][source]

Return shape of the noise input used in generator

Returns:

Shape of the noise input

Return type:

Tuple

get_data_input_shapes() List[source]

Return input shape of the discriminator

Returns:

List of shapes used as an input for distriminator.

Return type:

List

create_generator() Model[source]

Create generator model. Take noise data as an input and processes it through number of dense and dropout layers. Then data is converted into two forms one used for training and other for generation of compounds. The model has two outputs:

  1. edges

  2. nodes

The format differs depending on intended use (training or sample generation). For sample generation use flag, sample_generation=True while calling generator i.e. gan.generators[0](noise_input, training=False, sample_generation=True). For training the model, set sample_generation=False

create_discriminator() Model[source]

Create discriminator model based on MolGAN layers. Takes two inputs:

  1. adjacency tensor, containing bond information

  2. nodes tensor, containing atom information

The input vectors need to be in one-hot encoding format. Use MolGAN featurizer for that purpose. It will be simplified in the future release.

predict_gan_generator(batch_size: int = 1, noise_input: List | None = None, conditional_inputs: List = [], generator_index: int = 0) List[GraphMatrix][source]

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.

  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.

  • conditional_inputs (list of arrays) – NOT USED. the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.

  • generator_index (int) – NOT USED. the index of the generator (between 0 and n_generators-1) to use for generating the samples.

Returns:

Returns a list of GraphMatrix object that can be converted into RDKit molecules using MolGANFeaturizer defeaturize function.

Return type:

List[GraphMatrix]

ScScoreModel

class ScScoreModel(n_features, layer_sizes=[300, 300, 300], dropouts=0.0, **kwargs)[source]

The SCScore model is a neural network model based on the work of Coley et al. [1]_ that predicts the synthetic complexity score (SCScore) of molecules and correlates it with the expected number of reaction steps required to produce the given target molecule. It is trained on a dataset of over 12 million reactions from the Reaxys database to impose a pairwise inequality constraint enforcing that on average the products of published chemical reactions should be more synthetically complex than their corresponding reactants. The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes. The SCScore model can accurately predict the synthetic complexity of a variety of molecules, including both drug-like and natural product molecules. SCScore has the potential to be a valuable tool for chemists who are working on drug discovery and other areas of chemistry.

The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.

Our model uses hingeloss instead of the shifted relu loss as in the supplementary material [2]_ provided by the author. This could cause differentiation issues with compounds that are “close” to each other in “complexity”.

References

__init__(n_features, layer_sizes=[300, 300, 300], dropouts=0.0, **kwargs)[source]
Parameters:
  • n_features (int) – number of features per molecule

  • layer_sizes (list of int) – size of each hidden layer

  • dropouts (int) – droupout to apply to each hidden layer

  • kwargs – This takes all kwards as TensorGraph

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

SeqToSeq

class SeqToSeq(input_tokens, output_tokens, max_output_length, encoder_layers=4, decoder_layers=4, embedding_dimension=512, dropout=0.0, reverse_input=True, variational=False, annealing_start_step=5000, annealing_final_step=10000, **kwargs)[source]

Implements sequence to sequence translation models.

The model is based on the description in Sutskever et al., “Sequence to Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215), although this implementation uses GRUs instead of LSTMs. The goal is to take sequences of tokens as input, and translate each one into a different output sequence. The input and output sequences can both be of variable length, and an output sequence need not have the same length as the input sequence it was generated from. For example, these models were originally developed for use in natural language processing. In that context, the input might be a sequence of English words, and the output might be a sequence of French words. The goal would be to train the model to translate sentences from English to French.

The model consists of two parts called the “encoder” and “decoder”. Each one consists of a stack of recurrent layers. The job of the encoder is to transform the input sequence into a single, fixed length vector called the “embedding”. That vector contains all relevant information from the input sequence. The decoder then transforms the embedding vector into the output sequence.

These models can be used for various purposes. First and most obviously, they can be used for sequence to sequence translation. In any case where you have sequences of tokens, and you want to translate each one into a different sequence, a SeqToSeq model can be trained to perform the translation.

Another possible use case is transforming variable length sequences into fixed length vectors. Many types of models require their inputs to have a fixed shape, which makes it difficult to use them with variable sized inputs (for example, when the input is a molecule, and different molecules have different numbers of atoms). In that case, you can train a SeqToSeq model as an autoencoder, so that it tries to make the output sequence identical to the input one. That forces the embedding vector to contain all information from the original sequence. You can then use the encoder for transforming sequences into fixed length embedding vectors, suitable to use as inputs to other types of models.

Another use case is to train the decoder for use as a generative model. Here again you begin by training the SeqToSeq model as an autoencoder. Once training is complete, you can supply arbitrary embedding vectors, and transform each one into an output sequence. When used in this way, you typically train it as a variational autoencoder. This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. You can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.

When training as a variational autoencoder, it is best to use KL cost annealing, as described in https://arxiv.org/abs/1511.06349. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configurable.

__init__(input_tokens, output_tokens, max_output_length, encoder_layers=4, decoder_layers=4, embedding_dimension=512, dropout=0.0, reverse_input=True, variational=False, annealing_start_step=5000, annealing_final_step=10000, **kwargs)[source]

Construct a SeqToSeq model.

In addition to the following arguments, this class also accepts all the keyword arguments from TensorGraph.

Parameters:
  • input_tokens (list) – a list of all tokens that may appear in input sequences

  • output_tokens (list) – a list of all tokens that may appear in output sequences

  • max_output_length (int) – the maximum length of output sequence that may be generated

  • encoder_layers (int) – the number of recurrent layers in the encoder

  • decoder_layers (int) – the number of recurrent layers in the decoder

  • embedding_dimension (int) – the width of the embedding vector. This also is the width of all recurrent layers.

  • dropout (float) – the dropout probability to use during training

  • reverse_input (bool) – if True, reverse the order of input sequences before sending them into the encoder. This can improve performance when working with long sequences.

  • variational (bool) – if True, train the model as a variational autoencoder. This adds random noise to the encoder, and also constrains the embedding to follow a unit Gaussian distribution.

  • annealing_start_step (int) – the step (that is, batch) at which to begin turning on the constraint term for KL cost annealing

  • annealing_final_step (int) – the step (that is, batch) at which to finish turning on the constraint term for KL cost annealing

fit_sequences(sequences, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]

Train this model on a set of sequences

Parameters:
  • sequences (iterable) – the training samples to fit to. Each sample should be represented as a tuple of the form (input_sequence, output_sequence).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

predict_from_sequences(sequences, beam_width=5)[source]

Given a set of input sequences, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • sequences (iterable) – the input sequences to generate a prediction for

  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.

predict_from_embeddings(embeddings, beam_width=5)[source]

Given a set of embedding vectors, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • embeddings (iterable) – the embedding vectors to generate predictions for

  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.

predict_embeddings(sequences)[source]

Given a set of input sequences, compute the embedding vectors.

Parameters:

sequences (iterable) – the input sequences to generate an embedding vector for

GAN

class GAN(n_generators=1, n_discriminators=1, **kwargs)[source]

Implements Generative Adversarial Networks.

A Generative Adversarial Network (GAN) is a type of generative model. It consists of two parts called the “generator” and the “discriminator”. The generator takes random noise as input and transforms it into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.

In many cases there also are additional inputs to the generator and discriminator. In that case it is known as a Conditional GAN (CGAN), since it learns a distribution that is conditional on the values of those inputs. They are referred to as “conditional inputs”.

Many variations on this idea have been proposed, and new varieties of GANs are constantly being proposed. This class tries to make it very easy to implement straightforward GANs of the most conventional types. At the same time, it tries to be flexible enough that it can be used to implement many (but certainly not all) variations on the concept.

To define a GAN, you must create a subclass that provides implementations of the following methods:

get_noise_input_shape() get_data_input_shapes() create_generator() create_discriminator()

If you want your GAN to have any conditional inputs you must also implement:

get_conditional_input_shapes()

The following methods have default implementations that are suitable for most conventional GANs. You can override them if you want to customize their behavior:

create_generator_loss() create_discriminator_loss() get_noise_batch()

This class allows a GAN to have multiple generators and discriminators, a model known as MIX+GAN. It is described in Arora et al., “Generalization and Equilibrium in Generative Adversarial Nets (GANs)” (https://arxiv.org/abs/1703.00573). This can lead to better models, and is especially useful for reducing mode collapse, since different generators can learn different parts of the distribution. To use this technique, simply specify the number of generators and discriminators when calling the constructor. You can then tell predict_gan_generator() which generator to use for predicting samples.

__init__(n_generators=1, n_discriminators=1, **kwargs)[source]

Construct a GAN.

In addition to the parameters listed below, this class accepts all the keyword arguments from KerasModel.

Parameters:
  • n_generators (int) – the number of generators to include

  • n_discriminators (int) – the number of discriminators to include

get_noise_input_shape()[source]

Get the shape of the generator’s noise input layer.

Subclasses must override this to return a tuple giving the shape of the noise input. The actual Input layer will be created automatically. The dimension corresponding to the batch size should be omitted.

get_data_input_shapes()[source]

Get the shapes of the inputs for training data.

Subclasses must override this to return a list of tuples, each giving the shape of one of the inputs. The actual Input layers will be created automatically. This list of shapes must also match the shapes of the generator’s outputs. The dimension corresponding to the batch size should be omitted.

get_conditional_input_shapes()[source]

Get the shapes of any conditional inputs.

Subclasses may override this to return a list of tuples, each giving the shape of one of the conditional inputs. The actual Input layers will be created automatically. The dimension corresponding to the batch size should be omitted.

The default implementation returns an empty list, meaning there are no conditional inputs.

get_noise_batch(batch_size)[source]

Get a batch of random noise to pass to the generator.

This should return a NumPy array whose shape matches the one returned by get_noise_input_shape(). The default implementation returns normally distributed values. Subclasses can override this to implement a different distribution.

create_generator()[source]

Create and return a generator.

Subclasses must override this to construct the generator. The returned value should be a tf.keras.Model whose inputs are a batch of noise, followed by any conditional inputs. The number and shapes of its outputs must match the return value from get_data_input_shapes(), since generated data must have the same form as training data.

create_discriminator()[source]

Create and return a discriminator.

Subclasses must override this to construct the discriminator. The returned value should be a tf.keras.Model whose inputs are all data inputs, followed by any conditional inputs. Its output should be a one dimensional tensor containing the probability of each sample being a training sample.

create_generator_loss(discrim_output)[source]

Create the loss function for the generator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:

discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Return type:

A Tensor equal to the loss function to use for optimizing the generator.

create_discriminator_loss(discrim_output_train, discrim_output_gen)[source]

Create the loss function for the discriminator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:
  • discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.

  • discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Return type:

A Tensor equal to the loss function to use for optimizing the discriminator.

fit_gan(batches, generator_steps=1.0, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]

Train this model on data.

Parameters:
  • batches (iterable) – batches of data to train the discriminator on, each represented as a dict that maps Inputs to values. It should specify values for all members of data_inputs and conditional_inputs.

  • generator_steps (float) – the number of training steps to perform for the generator for each batch. This can be used to adjust the ratio of training steps for the generator and discriminator. For example, 2.0 will perform two training steps for every batch, while 0.5 will only perform one training step for every two batches.

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint before training it.

predict_gan_generator(batch_size=1, noise_input=None, conditional_inputs=[], generator_index=0)[source]

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.

  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.

  • conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.

  • generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for generating the samples.

Returns:

  • An array (if the generator has only one output) or list of arrays (if it has

  • multiple outputs) containing the generated samples.

WGAN

class WGAN(gradient_penalty=10.0, **kwargs)[source]

Implements Wasserstein Generative Adversarial Networks.

This class implements Wasserstein Generative Adversarial Networks (WGANs) as described in Arjovsky et al., “Wasserstein GAN” (https://arxiv.org/abs/1701.07875). A WGAN is conceptually rather different from a conventional GAN, but in practical terms very similar. It reinterprets the discriminator (often called the “critic” in this context) as learning an approximation to the Earth Mover distance between the training and generated distributions. The generator is then trained to minimize that distance. In practice, this just means using slightly different loss functions for training the generator and discriminator.

WGANs have theoretical advantages over conventional GANs, and they often work better in practice. In addition, the discriminator’s loss function can be directly interpreted as a measure of the quality of the model. That is an advantage over conventional GANs, where the loss does not directly convey information about the quality of the model.

The theory WGANs are based on requires the discriminator’s gradient to be bounded. The original paper achieved this by clipping its weights. This class instead does it by adding a penalty term to the discriminator’s loss, as described in https://arxiv.org/abs/1704.00028. This is sometimes found to produce better results.

There are a few other practical differences between GANs and WGANs. In a conventional GAN, the discriminator’s output must be between 0 and 1 so it can be interpreted as a probability. In a WGAN, it should produce an unbounded output that can be interpreted as a distance.

When training a WGAN, you also should usually use a smaller value for generator_steps. Conventional GANs rely on keeping the generator and discriminator “in balance” with each other. If the discriminator ever gets too good, it becomes impossible for the generator to fool it and training stalls. WGANs do not have this problem, and in fact the better the discriminator is, the easier it is for the generator to improve. It therefore usually works best to perform several training steps on the discriminator for each training step on the generator.

__init__(gradient_penalty=10.0, **kwargs)[source]

Construct a WGAN.

In addition to the following, this class accepts all the keyword arguments from GAN and KerasModel.

Parameters:

gradient_penalty (float) – the magnitude of the gradient penalty loss

create_generator_loss(discrim_output)[source]

Create the loss function for the generator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:

discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Return type:

A Tensor equal to the loss function to use for optimizing the generator.

create_discriminator_loss(discrim_output_train, discrim_output_gen)[source]

Create the loss function for the discriminator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:
  • discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.

  • discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Return type:

A Tensor equal to the loss function to use for optimizing the discriminator.

TextCNNModel

class TextCNNModel(n_tasks, char_dict, seq_length, n_embedding=75, kernel_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters=[100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout=0.25, mode='classification', **kwargs)[source]

A Convolutional neural network on smiles strings

Reimplementation of the discriminator module in ORGAN [1]_ . Originated from [2]_.

This model applies multiple 1D convolutional filters to the padded strings, then max-over-time pooling is applied on all filters, extracting one feature per filter. All features are concatenated and transformed through several hidden layers to form predictions.

This model is initially developed for sentence-level classification tasks, with words represented as vectors. In this implementation, SMILES strings are dissected into characters and transformed to one-hot vectors in a similar way. The model can be used for general molecular-level classification or regression tasks. It is also used in the ORGAN model as discriminator.

Training of the model only requires SMILES strings input, all featurized datasets that include SMILES in the ids attribute are accepted. PDBbind, QM7 and QM7b are not supported. To use the model, build_char_dict should be called first before defining the model to build character dict of input dataset, example can be found in examples/delaney/delaney_textcnn.py

References

__init__(n_tasks, char_dict, seq_length, n_embedding=75, kernel_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters=[100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout=0.25, mode='classification', **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • char_dict (dict) – Mapping from characters in smiles to integers

  • seq_length (int) – Length of sequences(after padding)

  • n_embedding (int, optional) – Length of embedding vector

  • filter_sizes (list of int, optional) – Properties of filters used in the conv net

  • num_filters (list of int, optional) – Properties of filters used in the conv net

  • dropout (float, optional) – Dropout rate

  • mode (str) – Either “classification” or “regression” for type of model.

static build_char_dict(dataset, default_dict={'#': 1, '(': 2, ')': 3, '+': 4, '-': 5, '/': 6, '1': 7, '2': 8, '3': 9, '4': 10, '5': 11, '6': 12, '7': 13, '8': 14, '=': 15, 'Br': 30, 'C': 16, 'Cl': 29, 'F': 17, 'H': 18, 'I': 19, 'N': 20, 'O': 21, 'P': 22, 'S': 23, '[': 24, '\\': 25, ']': 26, '_': 27, 'c': 28, 'n': 31, 'o': 32, 's': 33})[source]

Collect all unique characters(in smiles) from the dataset. This method should be called before defining the model to build appropriate char_dict

smiles_to_seq_batch(ids_b)[source]

Converts SMILES strings to np.array sequence.

A tf.py_func wrapper is written around this when creating the input_fn for make_estimator

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Transfer smiles strings to fixed length integer vectors

smiles_to_seq(smiles)[source]

Tokenize characters in smiles to integers

AtomicConvModel

AtomicConvModel[source]

alias of AtomConvModel

Smiles2Vec

class Smiles2Vec(char_to_idx, n_tasks=10, max_seq_len=270, embedding_dim=50, n_classes=2, use_bidir=True, use_conv=True, filters=192, kernel_size=3, strides=1, rnn_sizes=[224, 384], rnn_types=['GRU', 'GRU'], mode='regression', **kwargs)[source]

Implements the Smiles2Vec model, that learns neural representations of SMILES strings which can be used for downstream tasks.

The model is based on the description in Goh et al., “SMILES2vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties” (https://arxiv.org/pdf/1712.02034.pdf). The goal here is to take SMILES strings as inputs, turn them into vector representations which can then be used in predicting molecular properties.

The model consists of an Embedding layer that retrieves embeddings for each character in the SMILES string. These embeddings are learnt jointly with the rest of the model. The output from the embedding layer is a tensor of shape (batch_size, seq_len, embedding_dim). This tensor can optionally be fed through a 1D convolutional layer, before being passed to a series of RNN cells (optionally bidirectional). The final output from the RNN cells aims to have learnt the temporal dependencies in the SMILES string, and in turn information about the structure of the molecule, which is then used for molecular property prediction.

In the paper, the authors also train an explanation mask to endow the model with interpretability and gain insights into its decision making. This segment is currently not a part of this implementation as this was developed for the purpose of investigating a transfer learning protocol, ChemNet (which can be found at https://arxiv.org/abs/1712.02734).

__init__(char_to_idx, n_tasks=10, max_seq_len=270, embedding_dim=50, n_classes=2, use_bidir=True, use_conv=True, filters=192, kernel_size=3, strides=1, rnn_sizes=[224, 384], rnn_types=['GRU', 'GRU'], mode='regression', **kwargs)[source]
Parameters:
  • char_to_idx (dict,) – char_to_idx contains character to index mapping for SMILES characters

  • embedding_dim (int, default 50) – Size of character embeddings used.

  • use_bidir (bool, default True) – Whether to use BiDirectional RNN Cells

  • use_conv (bool, default True) – Whether to use a 1D conv-layer

  • kernel_size (int, default 3) – Kernel size for convolutions

  • filters (int, default 192) – Number of filters

  • strides (int, default 1) – Strides used in convolution

  • rnn_sizes (list[int], default [224, 384]) – Number of hidden units in the RNN cells

  • mode (str, default regression) – Whether to use model for regression or classification

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

ChemCeption

class ChemCeption(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: str = 'regression', **kwargs)[source]

Implements the ChemCeption model that leverages the representational capacities of convolutional neural networks (CNNs) to predict molecular properties.

The model is based on the description in Goh et al., “Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models” (https://arxiv.org/pdf/1706.06689.pdf). The authors use an image based representation of the molecule, where pixels encode different atomic and bond properties. More details on the image repres- entations can be found at https://arxiv.org/abs/1710.02238

The model consists of a Stem Layer that reduces the image resolution for the layers to follow. The output of the Stem Layer is followed by a series of Inception-Resnet blocks & a Reduction layer. Layers in the Inception-Resnet blocks process image tensors at multiple resolutions and use a ResNet style skip-connection, combining features from different resolutions. The Reduction layers reduce the spatial extent of the image by max-pooling and 2-strided convolutions. More details on these layers can be found in the ChemCeption paper referenced above. The output of the final Reduction layer is subject to a Global Average Pooling, and a fully-connected layer maps the features to downstream outputs.

In the ChemCeption paper, the authors perform real-time image augmentation by rotating images between 0 to 180 degrees. This can be done during model training by setting the augment argument to True.

__init__(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: str = 'regression', **kwargs)[source]
Parameters:
  • img_spec (str, default std) – Image specification used

  • img_size (int, default 80) – Image size used

  • base_filters (int, default 16) – Base filters used for the different inception and reduction layers

  • inception_blocks (dict,) – Dictionary containing number of blocks for every inception layer

  • n_tasks (int, default 10) – Number of classification or regression tasks

  • n_classes (int, default 2) – Number of classes (used only for classification)

  • augment (bool, default False) – Whether to augment images

  • mode (str, default regression) – Whether the model is used for regression or classification

build_inception_module(inputs, type='A')[source]

Inception module is a series of inception layers of similar type. This function builds that.

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

NormalizingFlowModel

The purpose of a normalizing flow is to map a simple distribution (that is easy to sample from and evaluate probability densities for) to a more complex distribution that is learned from data. Normalizing flows combine the advantages of autoregressive models (which provide likelihood estimation but do not learn features) and variational autoencoders (which learn feature representations but do not provide marginal likelihoods). They are effective for any application requiring a probabilistic model with these capabilities, e.g. generative modeling, unsupervised learning, or probabilistic inference.

class NormalizingFlowModel(model: NormalizingFlow, **kwargs)[source]

A base distribution and normalizing flow for applying transformations.

Normalizing flows are effective for any application requiring a probabilistic model that can both sample from a distribution and compute marginal likelihoods, e.g. generative modeling, unsupervised learning, or probabilistic inference. For a thorough review of normalizing flows, see [1]_.

A distribution implements two main operations:
  1. Sampling from the transformed distribution

  2. Calculating log probabilities

A normalizing flow implements three main operations:
  1. Forward transformation

  2. Inverse transformation

  3. Calculating the Jacobian

Deep Normalizing Flow models require normalizing flow layers where input and output dimensions are the same, the transformation is invertible, and the determinant of the Jacobian is efficient to compute and differentiable. The determinant of the Jacobian of the transformation gives a factor that preserves the probability volume to 1 when transforming between probability densities of different random variables.

References

__init__(model: NormalizingFlow, **kwargs) None[source]

Creates a new NormalizingFlowModel.

In addition to the following arguments, this class also accepts all the keyword arguments from KerasModel.

Parameters:

model (NormalizingFlow) – An instance of NormalizingFlow.

Examples

>> import tensorflow_probability as tfp >> tfd = tfp.distributions >> tfb = tfp.bijectors >> flow_layers = [ .. tfb.RealNVP( .. num_masked=2, .. shift_and_log_scale_fn=tfb.real_nvp_default_template( .. hidden_layers=[8, 8])) ..] >> base_distribution = tfd.MultivariateNormalDiag(loc=[0., 0., 0.]) >> nf = NormalizingFlow(base_distribution, flow_layers) >> nfm = NormalizingFlowModel(nf) >> dataset = NumpyDataset( .. X=np.random.rand(5, 3).astype(np.float32), .. y=np.random.rand(5,), .. ids=np.arange(5)) >> nfm.fit(dataset)

create_nll(input: Tensor | Sequence[Tensor]) Tensor[source]

Create the negative log likelihood loss function.

The default implementation is appropriate for most cases. Subclasses can override this if there is a need to customize it.

Parameters:

input (OneOrMany[tf.Tensor]) – A batch of data.

Return type:

A Tensor equal to the loss function to use for optimization.

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

PyTorch Models

DeepChem supports the use of PyTorch to build deep learning models.

TorchModel

You can wrap an arbitrary torch.nn.Module in a TorchModel object.

class TorchModel(model: Module, loss: Loss | Callable[[List, List, List], Any], output_types: List[str] | None = None, batch_size: int = 100, model_dir: str | None = None, learning_rate: float | LearningRateSchedule = 0.001, optimizer: Optimizer | None = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, device: device | None = None, regularization_loss: Callable | None = None, wandb_logger: WandbLogger | None = None, **kwargs)[source]

This is a DeepChem model implemented by a PyTorch model.

Here is a simple example of code that uses TorchModel to train a PyTorch model on a DeepChem dataset.

>>> import torch
>>> import deepchem as dc
>>> import numpy as np
>>> X, y = np.random.random((10, 100)), np.random.random((10, 1))
>>> dataset = dc.data.NumpyDataset(X=X, y=y)
>>> pytorch_model = torch.nn.Sequential(
...   torch.nn.Linear(100, 1000),
...   torch.nn.Tanh(),
...   torch.nn.Linear(1000, 1))
>>> model = dc.models.TorchModel(pytorch_model, loss=dc.models.losses.L2Loss())
>>> loss = model.fit(dataset, nb_epoch=5)

The loss function for a model can be defined in two different ways. For models that have only a single output and use a standard loss function, you can simply provide a dc.models.losses.Loss object. This defines the loss for each sample or sample/task pair. The result is automatically multiplied by the weights and averaged over the batch.

For more complicated cases, you can instead provide a function that directly computes the total loss. It must be of the form f(outputs, labels, weights), taking the list of outputs from the model, the expected values, and any weight matrices. It should return a scalar equal to the value of the loss function for the batch. No additional processing is done to the result; it is up to you to do any weighting, averaging, adding of penalty terms, etc.

You can optionally provide an output_types argument, which describes how to interpret the model’s outputs. This should be a list of strings, one for each output. You can use an arbitrary output_type for a output, but some output_types are special and will undergo extra processing:

  • ‘prediction’: This is a normal output, and will be returned by predict().

    If output types are not specified, all outputs are assumed to be of this type.

  • ‘loss’: This output will be used in place of the normal

    outputs for computing the loss function. For example, models that output probability distributions usually do it by computing unbounded numbers (the logits), then passing them through a softmax function to turn them into probabilities. When computing the cross entropy, it is more numerically stable to use the logits directly rather than the probabilities. You can do this by having the model produce both probabilities and logits as outputs, then specifying output_types=[‘prediction’, ‘loss’]. When predict() is called, only the first output (the probabilities) will be returned. But during training, it is the second output (the logits) that will be passed to the loss function.

  • ‘variance’: This output is used for estimating the

    uncertainty in another output. To create a model that can estimate uncertainty, there must be the same number of ‘prediction’ and ‘variance’ outputs. Each variance output must have the same shape as the corresponding prediction output, and each element is an estimate of the variance in the corresponding prediction. Also be aware that if a model supports uncertainty, it MUST use dropout on every layer, and dropout most be enabled during uncertainty prediction. Otherwise, the uncertainties it computes will be inaccurate.

  • other: Arbitrary output_types can be used to extract outputs

    produced by the model, but will have no additional processing performed.

__init__(model: Module, loss: Loss | Callable[[List, List, List], Any], output_types: List[str] | None = None, batch_size: int = 100, model_dir: str | None = None, learning_rate: float | LearningRateSchedule = 0.001, optimizer: Optimizer | None = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, device: device | None = None, regularization_loss: Callable | None = None, wandb_logger: WandbLogger | None = None, **kwargs) None[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

fit(dataset: Dataset, nb_epoch: int = 10, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, deterministic: bool = False, restore: bool = False, variables: List[Parameter] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on

  • nb_epoch (int) – the number of epochs to train for

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step, **kwargs) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Return type:

The average loss over the most recent checkpoint interval

fit_generator(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: List[Parameter] | ParameterList | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on data from a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of torch.nn.Parameter or torch.nn.ParameterList) – the variables to train. If None (the default), all trainable variables in the model are used. ParameterList can be used like a regular Python list, but Tensors that are Parameter are properly registered, and will be visible by all Module methods.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Return type:

The average loss over the most recent checkpoint interval

fit_on_batch(X: Sequence, y: Sequence, w: Sequence, variables: List[Parameter] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], checkpoint: bool = True, max_checkpoints_to_keep: int = 5) float[source]

Perform a single step of training.

Parameters:
  • X (ndarray) – the inputs for the batch

  • y (ndarray) – the labels for the batch

  • w (ndarray) – the weights for the batch

  • variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • checkpoint (bool) – if true, save a checkpoint after performing the training step

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

Return type:

the loss on the batch

predict_on_generator(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer] = [], output_types: str | Sequence[str] | None = None) ndarray | Sequence[ndarray][source]
Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

  • Returns – a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

predict_on_batch(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], transformers: List[Transformer] = []) ndarray | Sequence[ndarray][source]

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

predict_uncertainty_on_batch(X: Sequence, masks: int = 50) Tuple[ndarray, ndarray] | Sequence[Tuple[ndarray, ndarray]][source]

Predict the model’s outputs, along with the uncertainty in each one.

The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.

  • masks (int) – the number of dropout masks to average over

Returns:

  • for each output, a tuple (y_pred, y_std) where y_pred is the predicted

  • value of the output, and each element of y_std estimates the standard

  • deviation of the corresponding element of y_pred

predict(dataset: Dataset, transformers: List[Transformer] = [], output_types: List[str] | None = None) ndarray | Sequence[ndarray][source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

predict_embedding(dataset: Dataset) ndarray | Sequence[ndarray][source]

Predicts embeddings created by underlying model if any exist. An embedding must be specified to have output_type of ‘embedding’ in the model definition.

Parameters:

dataset (dc.data.Dataset) – Dataset to make prediction on

Returns:

  • a NumPy array of the embeddings model produces, or a list

  • of arrays if it produces multiple embeddings

predict_uncertainty(dataset: Dataset, masks: int = 50) Tuple[ndarray, ndarray] | Sequence[Tuple[ndarray, ndarray]][source]

Predict the model’s outputs, along with the uncertainty in each one.

The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on

  • masks (int) – the number of dropout masks to average over

Returns:

  • for each output, a tuple (y_pred, y_std) where y_pred is the predicted

  • value of the output, and each element of y_std estimates the standard

  • deviation of the corresponding element of y_pred

evaluate_generator(generator: Iterable[Tuple[Any, Any, Any]], metrics: List[Metric], transformers: List[Transformer] = [], per_task_metrics: bool = False)[source]

Evaluate the performance of this model on the data produced by a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • metric (list of deepchem.metrics.Metric) – Evaluation metric

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • per_task_metrics (bool) – If True, return per-task scores.

Returns:

Maps tasks to scores under metric.

Return type:

dict

compute_saliency(X: ndarray) ndarray | Sequence[ndarray][source]

Compute the saliency map for an input sample.

This computes the Jacobian matrix with the derivative of each output element with respect to each input element. More precisely,

  • If this model has a single output, it returns a matrix of shape

    (output_shape, input_shape) with the derivatives.

  • If this model has multiple outputs, it returns a list of matrices, one

    for each output.

This method cannot be used on models that take multiple inputs.

Parameters:

X (ndarray) – the input data for a single sample

Return type:

the Jacobian matrix, or a list of matrices

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

save_checkpoint(max_checkpoints_to_keep: int = 5, model_dir: str | None = None) None[source]

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded. If set to zero, the function will simply return as no checkpoint is saved.

  • model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir

get_checkpoints(model_dir: str | None = None)[source]

Get a list of all available checkpoint files.

Parameters:

model_dir (str, default None) – Directory to get list of checkpoints from. Reverts to self.model_dir if None

restore(checkpoint: str | None = None, model_dir: str | None = None, strict: bool | None = True) None[source]

Reload the values of all variables from a checkpoint file.

Parameters:
  • checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.

  • model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If checkpoint is not None, this is ignored.

  • strict (bool, default True) – Whether or not to strictly enforce that the keys in checkpoint match the keys returned by this model’s get_variable_scope() method.

compile(fullgraph: bool = False, dynamic: bool | None = None, backend: str = 'inductor', mode: str = 'default', **kwargs) None[source]

Compiles the model using torch.compile for faster training and inference. Visit https://pytorch.org/docs/stable/generated/torch.compile.html for more information.

Parameters:
  • fullgraph (bool, default False) – If True, torch.compile will require that the entire function be capturable into a single graph. If this is not possible (that is, if there are graph breaks), then the function will raise an error.

  • dynamic (bool, default None) – Use dynamic shape tracing. When this is True, the function will up-front attempt to generate a kernel that is as dynamic as possible to avoid recompilations when sizes change. This may not always work as some operations/optimizations will force specialization. When set to False, torch.compile will never generate dynamic kernels. By default, the function automatically detects if dynamism has occurred and will compile a more dynamic kernel upon recompile.

  • backend (str, default 'inductor') – The backend to use for compilation. Currently, only ‘inductor’ is supported.

  • mode (str, default 'default') – The mode to use for compilation. See the torch.compile documentation for available modes and detailed descriptions.

  • kwargs (dict) – Additional arguments to pass to torch.compile.

get_global_step() int[source]

Get the number of steps of fitting that have been performed.

load_from_pretrained(source_model: TorchModel, assignment_map: Dict[Any, Any] | None = None, value_map: Dict[Any, Any] | None = None, checkpoint: str | None = None, model_dir: str | None = None, include_top: bool = True, inputs: Sequence[Any] | None = None, **kwargs) None[source]

Copies parameter values from a pretrained model. source_model can either be a pretrained model or a model with the same architecture. value_map is a parameter-value dictionary. If no value_map is provided, the parameter values are restored to the source_model from a checkpoint and a default value_map is created. assignment_map is a dictionary mapping parameters from the source_model to the current model. If no assignment_map is provided, one is made from scratch and assumes the model is composed of several different layers, with the final one being a dense layer. include_top is used to control whether or not the final dense layer is used. The default assignment map is useful in cases where the type of task is different (classification vs regression) and/or number of tasks in the setting.

Parameters:
  • source_model (dc.TorchModel, required) – source_model can either be the pretrained model or a dc.TorchModel with the same architecture as the pretrained model. It is used to restore from a checkpoint, if value_map is None and to create a default assignment map if assignment_map is None

  • assignment_map (Dict, default None) – Dictionary mapping the source_model parameters and current model parameters

  • value_map (Dict, default None) – Dictionary containing source_model trainable parameters mapped to numpy arrays. If value_map is None, the values are restored and a default parameter map is created using the restored values

  • checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints

  • model_dir (str, default None) – Restore source model from custom model directory if needed

  • include_top (bool, default True) – if True, copies the weights and bias associated with the final dense layer. Used only when assignment map is None

  • inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.

ModularTorchModel

You can modify networks for different tasks by using a ModularTorchModel.

class ModularTorchModel(model: Module, components: dict, **kwargs)[source]

ModularTorchModel is a subclass of TorchModel that allows for components to be pretrained and then combined into a final model. It is designed to be subclassed for specific models and is not intended to be used directly. There are 3 main differences between ModularTorchModel and TorchModel:

  • The build_components() method is used to define the components of the model.

  • The components are combined into a final model with the build_model() method.

  • The loss function is defined with the loss_func method. This may access the components to compute the loss using intermediate values from the network, rather than just the full forward pass output.

Here is an example of how to use ModularTorchModel to pretrain a linear layer, load it into another network and then finetune that network:

>>> import numpy as np
>>> import deepchem as dc
>>> import torch
>>> n_samples = 6
>>> n_feat = 3
>>> n_hidden = 2
>>> n_tasks = 6
>>> pt_tasks = 3
>>> X = np.random.rand(n_samples, n_feat)
>>> y_pretrain = np.zeros((n_samples, pt_tasks)).astype(np.float32)
>>> dataset_pt = dc.data.NumpyDataset(X, y_pretrain)
>>> y_finetune = np.zeros((n_samples, n_tasks)).astype(np.float32)
>>> dataset_ft = dc.data.NumpyDataset(X, y_finetune)
>>> components = {'linear': torch.nn.Linear(n_feat, n_hidden),
... 'activation': torch.nn.ReLU(), 'head': torch.nn.Linear(n_hidden, n_tasks)}
>>> model = torch.nn.Sequential(components['linear'], components['activation'],
... components['head'])
>>> modular_model = dc.models.torch_models.modular.ModularTorchModel(model, components)
>>> def example_loss_func(inputs, labels, weights):
...    return (torch.nn.functional.mse_loss(model(inputs), labels[0]) * weights[0]).mean()
>>> modular_model.loss_func = example_loss_func
>>> def example_model_build():
...     return torch.nn.Sequential(components['linear'], components['activation'],
... components['head'])
>>> modular_model.build_model = example_model_build
>>> pretrain_components = {'linear': torch.nn.Linear(n_feat, n_hidden),
... 'activation': torch.nn.ReLU(), 'head': torch.nn.Linear(n_hidden, pt_tasks)}
>>> pretrain_model = torch.nn.Sequential(pretrain_components['linear'],
... pretrain_components['activation'], pretrain_components['head'])
>>> pretrain_modular_model = dc.models.torch_models.modular.ModularTorchModel(pretrain_model,
... pretrain_components)
>>> def example_pt_loss_func(inputs, labels, weights):
...     return (torch.nn.functional.mse_loss(pretrain_model(inputs), labels[0]) * weights[0]).mean()
>>> pretrain_modular_model.loss_func = example_pt_loss_func
>>> pt_loss = pretrain_modular_model.fit(dataset_pt, nb_epoch=1)
>>> modular_model.load_from_pretrained(pretrain_modular_model, components=['linear'])
>>> ft_loss = modular_model.fit(dataset_ft, nb_epoch=1)
__init__(model: Module, components: dict, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_model() Module[source]

Builds the final model from the components.

build_components() dict[source]

Creates the components dictionary, with the keys being the names of the components and the values being torch.nn.module objects.

loss_func(inputs: Tensor | Sequence[Tensor], labels: Sequence, weights: Sequence) Tensor[source]

Defines the loss function for the model which can access the components using self.components. The loss function should take the inputs, labels, and weights as arguments and return the loss.

freeze_components(components: List[str])[source]

Freezes or unfreezes the parameters of the specified components.

Components string refers to keys in self.components.

Parameters:

components (List[str]) – The components to freeze.

unfreeze_components(components: List[str])[source]

Unfreezes the parameters of the specified components.

Components string refers to keys in self.components.

Parameters:

components (List[str]) – The components to unfreeze.

fit_generator(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: List[Parameter] | ParameterList | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on data from a generator. This method is similar to the TorchModel implementation, but it passes the inputs directly to the loss function, rather than passing them through the model first. This enables the loss to be calculated from intermediate steps of the model and not just the final output.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step, **kwargs) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Return type:

The average loss over the most recent checkpoint interval

load_from_pretrained(source_model: ModularTorchModel | None = None, components: List[str] | None = None, checkpoint: str | None = None, model_dir: str | None = None, inputs: Sequence[Any] | None = None, **kwargs) None[source]

Copies parameter values from a pretrained model. The pretrained model can be loaded as a source_model (ModularTorchModel object), checkpoint (pytorch .ckpt file) or a model_dir (directory with .ckpt files). Specific components can be chosen by passing a list of strings with the desired component names. If both a source_model and a checkpoint/model_dir are loaded, the source_model weights will be loaded.

Parameters:
  • source_model (dc.ModularTorchModel, required) – source_model can either be the pretrained model or a dc.TorchModel with the same architecture as the pretrained model. It is used to restore from a checkpoint, if value_map is None and to create a default assignment map if assignment_map is None

  • checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints

  • model_dir (str, default None) – Restore source model from custom model directory if needed

  • inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.

save_checkpoint(max_checkpoints_to_keep=5, model_dir=None)[source]

Saves the current state of the model and its components as a checkpoint file in the specified model directory. It maintains a maximum number of checkpoint files, deleting the oldest one when the limit is reached.

Parameters:
  • max_checkpoints_to_keep (int, default 5) – Maximum number of checkpoint files to keep.

  • model_dir (str, default None) – The directory to save the checkpoint file in. If None, the model_dir specified in the constructor is used.

restore(components: List[str] | None = None, checkpoint: str | None = None, model_dir: str | None = None) None[source]

Restores the state of a ModularTorchModel from a checkpoint file.

If no checkpoint file is provided, it will use the latest checkpoint found in the model directory. If a list of component names is provided, only the state of those components will be restored.

Parameters:
  • components (Optional[List[str]]) – A list of component names to restore. If None, all components will be restored.

  • checkpoint (Optional[str]) – The path to the checkpoint file. If None, the latest checkpoint in the model directory will be used.

  • model_dir (Optional[str]) – The path to the model directory. If None, the model directory used to initialize the model will be used.

CNN

class CNN(n_tasks: int, n_features: int, dims: int, layer_filters: List[int] = [100], kernel_size: int | Sequence[int] = 5, strides: int | Sequence[int] = 1, weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', pool_type: str = 'max', mode: str = 'classification', n_classes: int = 2, uncertainty: bool = False, residual: bool = False, padding: int | str = 'valid', **kwargs)[source]

A 1, 2, or 3 dimensional convolutional network for either regression or classification.

The network consists of the following sequence of layers:

  • A configurable number of convolutional layers

  • A global pooling layer (either max pool or average pool)

  • A final fully connected layer to compute the output

It optionally can compose the model from pre-activation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of convolution layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same output shape. Wherever the output shape changes, a simple convolution layer will be used even if residual=True.

Examples

>>> import deepchem as dc
>>> n_samples = 10
>>> n_features = 3
>>> n_tasks = 1
>>> np.random.seed(123)
>>> X = np.random.rand(n_samples, 10, n_features)
>>> y = np.random.randint(2, size=(n_samples, n_tasks)).astype(np.float32)
>>> dataset: dc.data.Dataset = dc.data.NumpyDataset(X, y)
>>> regression_metric = dc.metrics.Metric(dc.metrics.mean_squared_error)
>>> model = CNN(n_tasks, n_features, dims=1, kernel_size=3, mode='regression')
>>> avg_loss = model.fit(dataset, nb_epoch=10)
__init__(n_tasks: int, n_features: int, dims: int, layer_filters: List[int] = [100], kernel_size: int | Sequence[int] = 5, strides: int | Sequence[int] = 1, weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', pool_type: str = 'max', mode: str = 'classification', n_classes: int = 2, uncertainty: bool = False, residual: bool = False, padding: int | str = 'valid', **kwargs) None[source]

TorchModel wrapper for CNN

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • dims (int) – the number of dimensions to apply convolutions over (1, 2, or 3)

  • layer_filters (list) – the number of output filters for each convolutional layer in the network. The length of this list determines the number of layers.

  • kernel_size (int, tuple, or list) – a list giving the shape of the convolutional kernel for each layer. Each element may be either an int (use the same kernel width for every dimension) or a tuple (the kernel width along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same kernel shape is used for every layer.

  • strides (int, tuple, or list) – a list giving the stride between applications of the kernel for each layer. Each element may be either an int (use the same stride for every dimension) or a tuple (the stride along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same stride is used for every layer.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer

  • activation_fns (str or list) – the torch activation function to apply to each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer, ‘relu’ by default

  • pool_type (str) – the type of pooling layer to use, either ‘max’ or ‘average’

  • mode (str) – Either ‘classification’ or ‘regression’

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

  • residual (bool) – if True, the model will be composed of pre-activation residual blocks instead of a simple stack of convolutional layers.

  • padding (str, int or tuple) – the padding to use for convolutional layers, either ‘valid’ or ‘same’

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

MultitaskRegressor

class MultitaskRegressor(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', uncertainty: bool = False, residual: bool = False, **kwargs)[source]

A fully connected network for multitask regression.

This class provides lots of options for customizing aspects of the model: the number and widths of layers, the activation functions, regularization methods, etc.

It optionally can compose the model from pre-activation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of dense layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same width. Wherever the layer width changes, a simple dense layer will be used even if residual=True.

__init__(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', uncertainty: bool = False, residual: bool = False, **kwargs) None[source]

Create a MultitaskRegressor.

In addition to the following arguments, this class also accepts all the keywork arguments from TensorGraph.

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

  • residual (bool) – if True, the model will be composed of pre-activation residual blocks instead of a simple stack of dense layers.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

MultitaskFitTransformRegressor

class MultitaskFitTransformRegressor(n_tasks: int, n_features: int, fit_transformers: Sequence[Transformer] = [], batch_size: int = 50, **kwargs)[source]

Implements a MultitaskRegressor that performs on-the-fly transformation during fit/predict.

Examples

>>> n_samples = 10
>>> n_features = 3
>>> n_tasks = 1
>>> ids = np.arange(n_samples)
>>> X = np.random.rand(n_samples, n_features, n_features)
>>> y = np.zeros((n_samples, n_tasks))
>>> w = np.ones((n_samples, n_tasks))
>>> dataset = dc.data.NumpyDataset(X, y, w, ids)
>>> fit_transformers = [dc.trans.CoulombFitTransformer(dataset)]
>>> model = dc.models.MultitaskFitTransformRegressor(n_tasks, [n_features, n_features],
...     dropouts=[0.], learning_rate=0.003, weight_init_stddevs=[np.sqrt(6)/np.sqrt(1000)],
...     batch_size=n_samples, fit_transformers=fit_transformers)
>>> model.n_features
12
__init__(n_tasks: int, n_features: int, fit_transformers: Sequence[Transformer] = [], batch_size: int = 50, **kwargs)[source]

Create a MultitaskFitTransformRegressor.

In addition to the following arguments, this class also accepts all the keywork arguments from MultitaskRegressor.

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (list or int) – number of features

  • fit_transformers (list) – List of dc.trans.FitTransformer objects

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

predict_on_generator(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer] = [], output_types: str | Sequence[str] | None = None) ndarray | Sequence[ndarray][source]
Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

  • Returns – a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

MultitaskClassifier

class MultitaskClassifier(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', n_classes: int = 2, residual: bool = False, **kwargs)[source]

A fully connected network for multitask classification.

This class provides lots of options for customizing aspects of the model: the number and widths of layers, the activation functions, regularization methods, etc.

It optionally can compose the model from pre-activation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of dense layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same width. Wherever the layer width changes, a simple dense layer will be used even if residual=True.

__init__(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', n_classes: int = 2, residual: bool = False, **kwargs) None[source]

Create a MultitaskClassifier.

In addition to the following arguments, this class also accepts all the keyword arguments from TensorGraph.

Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.

  • n_classes (int) – the number of classes

  • residual (bool) – if True, the model will be composed of pre-activation residual blocks instead of a simple stack of dense layers.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

CGCNNModel

class CGCNNModel(in_node_dim: int = 92, hidden_node_dim: int = 64, in_edge_dim: int = 41, num_conv: int = 3, predictor_hidden_feats: int = 128, n_tasks: int = 1, mode: str = 'regression', n_classes: int = 2, **kwargs)[source]

Crystal Graph Convolutional Neural Network (CGCNN).

Here is a simple example of code that uses the CGCNNModel with materials dataset.

Examples

>>> import deepchem as dc
>>> dataset_config = {"reload": False, "featurizer": dc.feat.CGCNNFeaturizer(), "transformers": []}
>>> tasks, datasets, transformers = dc.molnet.load_perovskite(**dataset_config)
>>> train, valid, test = datasets
>>> model = dc.models.CGCNNModel(mode='regression', batch_size=32, learning_rate=0.001)
>>> avg_loss = model.fit(train, nb_epoch=50)

This model takes arbitary crystal structures as an input, and predict material properties using the element information and connection of atoms in the crystal. If you want to get some material properties which has a high computational cost like band gap in the case of DFT, this model may be useful. This model is one of variants of Graph Convolutional Networks. The main differences between other GCN models are how to construct graphs and how to update node representations. This model defines the crystal graph from structures using distances between atoms. The crystal graph is an undirected multigraph which is defined by nodes representing atom properties and edges representing connections between atoms in a crystal. And, this model updates the node representations using both neighbor node and edge representations. Please confirm the detail algorithms from [1]_.

References

Notes

This class requires DGL and PyTorch to be installed.

__init__(in_node_dim: int = 92, hidden_node_dim: int = 64, in_edge_dim: int = 41, num_conv: int = 3, predictor_hidden_feats: int = 128, n_tasks: int = 1, mode: str = 'regression', n_classes: int = 2, **kwargs)[source]

This class accepts all the keyword arguments from TorchModel.

Parameters:
  • in_node_dim (int, default 92) – The length of the initial node feature vectors. The 92 is based on length of vectors in the atom_init.json.

  • hidden_node_dim (int, default 64) – The length of the hidden node feature vectors.

  • in_edge_dim (int, default 41) – The length of the initial edge feature vectors. The 41 is based on default setting of CGCNNFeaturizer.

  • num_conv (int, default 3) – The number of convolutional layers.

  • predictor_hidden_feats (int, default 128) – The size for hidden representations in the output MLP predictor.

  • n_tasks (int, default 1) – The number of the output size.

  • mode (str, default 'regression') – The model type, ‘classification’ or ‘regression’.

  • n_classes (int, default 2) – The number of classes to predict (only used in classification mode).

  • kwargs (Dict) – This class accepts all the keyword arguments from TorchModel.

GATModel

class GATModel(n_tasks: int, graph_attention_layers: list | None = None, n_attention_heads: int = 8, agg_modes: list | None = None, activation=<function elu>, residual: bool = True, dropout: float = 0.0, alpha: float = 0.2, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]

Model for Graph Property Prediction Based on Graph Attention Networks (GAT).

This model proceeds as follows:

  • Update node representations in graphs with a variant of GAT

  • For each graph, compute its representation by 1) a weighted sum of the node

    representations in the graph, where the weights are computed by applying a gating function to the node representations 2) a max pooling of the node representations 3) concatenating the output of 1) and 2)

  • Perform the final prediction using an MLP

Examples

>>> import deepchem as dc
>>> from deepchem.models import GATModel
>>> # preparing dataset
>>> smiles = ["C1CCC1", "C1=CC=CN=C1"]
>>> labels = [0., 1.]
>>> featurizer = dc.feat.MolGraphConvFeaturizer()
>>> X = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(X=X, y=labels)
>>> # training model
>>> model = GATModel(mode='classification', n_tasks=1,
...                  batch_size=16, learning_rate=0.001)
>>> loss = model.fit(dataset, nb_epoch=5)

References

Notes

This class requires DGL (https://github.com/dmlc/dgl) and DGL-LifeSci (https://github.com/awslabs/dgl-lifesci) to be installed.

__init__(n_tasks: int, graph_attention_layers: list | None = None, n_attention_heads: int = 8, agg_modes: list | None = None, activation=<function elu>, residual: bool = True, dropout: float = 0.0, alpha: float = 0.2, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • graph_attention_layers (list of int) – Width of channels per attention head for GAT layers. graph_attention_layers[i] gives the width of channel for each attention head for the i-th GAT layer. If both graph_attention_layers and agg_modes are specified, they should have equal length. If not specified, the default value will be [8, 8].

  • n_attention_heads (int) – Number of attention heads in each GAT layer.

  • agg_modes (list of str) – The way to aggregate multi-head attention results for each GAT layer, which can be either ‘flatten’ for concatenating all-head results or ‘mean’ for averaging all-head results. agg_modes[i] gives the way to aggregate multi-head attention results for the i-th GAT layer. If both graph_attention_layers and agg_modes are specified, they should have equal length. If not specified, the model will flatten multi-head results for intermediate GAT layers and compute mean of multi-head results for the last GAT layer.

  • activation (activation function or None) – The activation function to apply to the aggregated multi-head results for each GAT layer. If not specified, the default value will be ELU.

  • residual (bool) – Whether to add a residual connection within each GAT layer. Default to True.

  • dropout (float) – The dropout probability within each GAT layer. Default to 0.

  • alpha (float) – A hyperparameter in LeakyReLU, which is the slope for negative values. Default to 0.2.

  • predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.

  • predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.

  • mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.

  • number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.

  • n_classes (int) – The number of classes to predict per task (only used when mode is ‘classification’). Default to 2.

  • self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.

  • kwargs – This can include any keyword argument of TorchModel.

GCNModel

class GCNModel(n_tasks: int, graph_conv_layers: list | None = None, activation=None, residual: bool = True, batchnorm: bool = False, dropout: float = 0.0, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features=30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]

Model for Graph Property Prediction Based on Graph Convolution Networks (GCN).

This model proceeds as follows:

  • Update node representations in graphs with a variant of GCN

  • For each graph, compute its representation by 1) a weighted sum of the node

    representations in the graph, where the weights are computed by applying a gating function to the node representations 2) a max pooling of the node representations 3) concatenating the output of 1) and 2)

  • Perform the final prediction using an MLP

Examples

>>> import deepchem as dc
>>> from deepchem.models import GCNModel
>>> # preparing dataset
>>> smiles = ["C1CCC1", "CCC"]
>>> labels = [0., 1.]
>>> featurizer = dc.feat.MolGraphConvFeaturizer()
>>> X = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(X=X, y=labels)
>>> # training model
>>> model = GCNModel(mode='classification', n_tasks=1,
...                  batch_size=16, learning_rate=0.001)
>>> loss = model.fit(dataset, nb_epoch=5)

References

Notes

This class requires DGL (https://github.com/dmlc/dgl) and DGL-LifeSci (https://github.com/awslabs/dgl-lifesci) to be installed.

This model is different from deepchem.models.GraphConvModel as follows:

  • For each graph convolution, the learnable weight in this model is shared across all nodes.

    GraphConvModel employs separate learnable weights for nodes of different degrees. A learnable weight is shared across all nodes of a particular degree.

  • For GraphConvModel, there is an additional GraphPool operation after each

    graph convolution. The operation updates the representation of a node by applying an element-wise maximum over the representations of its neighbors and itself.

  • For computing graph-level representations, this model computes a weighted sum and an

    element-wise maximum of the representations of all nodes in a graph and concatenates them. The node weights are obtained by using a linear/dense layer followd by a sigmoid function. For GraphConvModel, the sum over node representations is unweighted.

  • There are various minor differences in using dropout, skip connection and batch

    normalization.

__init__(n_tasks: int, graph_conv_layers: list | None = None, activation=None, residual: bool = True, batchnorm: bool = False, dropout: float = 0.0, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features=30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • graph_conv_layers (list of int) – Width of channels for GCN layers. graph_conv_layers[i] gives the width of channel for the i-th GCN layer. If not specified, the default value will be [64, 64].

  • activation (callable) – The activation function to apply to the output of each GCN layer. By default, no activation function will be applied.

  • residual (bool) – Whether to add a residual connection within each GCN layer. Default to True.

  • batchnorm (bool) – Whether to apply batch normalization to the output of each GCN layer. Default to False.

  • dropout (float) – The dropout probability for the output of each GCN layer. Default to 0.

  • predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.

  • predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.

  • mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.

  • number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.

  • n_classes (int) – The number of classes to predict per task (only used when mode is ‘classification’). Default to 2.

  • self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.

  • kwargs – This can include any keyword argument of TorchModel.

AttentiveFPModel

class AttentiveFPModel(n_tasks: int, num_layers: int = 2, num_timesteps: int = 2, graph_feat_size: int = 200, dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]

Model for Graph Property Prediction.

This model proceeds as follows:

  • Combine node features and edge features for initializing node representations,

    which involves a round of message passing

  • Update node representations with multiple rounds of message passing

  • For each graph, compute its representation by combining the representations

    of all nodes in it, which involves a gated recurrent unit (GRU).

  • Perform the final prediction using a linear layer

Examples

>>> import deepchem as dc
>>> from deepchem.models import AttentiveFPModel
>>> # preparing dataset
>>> smiles = ["C1CCC1", "C1=CC=CN=C1"]
>>> labels = [0., 1.]
>>> featurizer = dc.feat.MolGraphConvFeaturizer(use_edges=True)
>>> X = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(X=X, y=labels)
>>> # training model
>>> model = AttentiveFPModel(mode='classification', n_tasks=1,
...    batch_size=16, learning_rate=0.001)
>>> loss = model.fit(dataset, nb_epoch=5)

References

Notes

This class requires DGL (https://github.com/dmlc/dgl) and DGL-LifeSci (https://github.com/awslabs/dgl-lifesci) to be installed.

__init__(n_tasks: int, num_layers: int = 2, num_timesteps: int = 2, graph_feat_size: int = 200, dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • num_layers (int) – Number of graph neural network layers, i.e. number of rounds of message passing. Default to 2.

  • num_timesteps (int) – Number of time steps for updating graph representations with a GRU. Default to 2.

  • graph_feat_size (int) – Size for graph representations. Default to 200.

  • dropout (float) – Dropout probability. Default to 0.

  • mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.

  • number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.

  • number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.

  • n_classes (int) – The number of classes to predict per task (only used when mode is ‘classification’). Default to 2.

  • self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.

  • kwargs – This can include any keyword argument of TorchModel.

PagtnModel

class PagtnModel(n_tasks: int, number_atom_features: int = 94, number_bond_features: int = 42, mode: str = 'regression', n_classes: int = 2, output_node_features: int = 256, hidden_features: int = 32, num_layers: int = 5, num_heads: int = 1, dropout: float = 0.1, pool_mode: str = 'sum', **kwargs)[source]

Model for Graph Property Prediction.

This model proceeds as follows:

  • Update node representations in graphs with a variant of GAT, where a

    linear additive form of attention is applied. Attention Weights are derived by concatenating the node and edge features for each bond.

  • Update node representations with multiple rounds of message passing.

  • For each layer has, residual connections with its previous layer.

  • The final molecular representation is computed by combining the representations

    of all nodes in the molecule.

  • Perform the final prediction using a linear layer

Examples

>>> import deepchem as dc
>>> from deepchem.models import PagtnModel
>>> # preparing dataset
>>> smiles = ["C1CCC1", "CCC"]
>>> labels = [0., 1.]
>>> featurizer = dc.feat.PagtnMolGraphFeaturizer(max_length=5)
>>> X = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(X=X, y=labels)
>>> # training model
>>> model = PagtnModel(mode='classification', n_tasks=1,
...                    batch_size=16, learning_rate=0.001)
>>> loss = model.fit(dataset, nb_epoch=5)

References

Notes

This class requires DGL (https://github.com/dmlc/dgl) and DGL-LifeSci (https://github.com/awslabs/dgl-lifesci) to be installed.

__init__(n_tasks: int, number_atom_features: int = 94, number_bond_features: int = 42, mode: str = 'regression', n_classes: int = 2, output_node_features: int = 256, hidden_features: int = 32, num_layers: int = 5, num_heads: int = 1, dropout: float = 0.1, pool_mode: str = 'sum', **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • number_atom_features (int) – Size for the input node features. Default to 94.

  • number_bond_features (int) – Size for the input edge features. Default to 42.

  • mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.

  • n_classes (int) – The number of classes to predict per task (only used when mode is ‘classification’). Default to 2.

  • output_node_features (int) – Size for the output node features in PAGTN layers. Default to 256.

  • hidden_features (int) – Size for the hidden node features in PAGTN layers. Default to 32.

  • num_layers (int) – Number of graph neural network layers, i.e. number of rounds of message passing. Default to 2.

  • num_heads (int) – Number of attention heads. Default to 1.

  • dropout (float) – Dropout probability. Default to 0.1

  • pool_mode ('max' or 'mean' or 'sum') – Whether to compute elementwise maximum, mean or sum of the node representations.

  • kwargs – This can include any keyword argument of TorchModel.

AtomConvModel

class AtomConvModel(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, -1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ['relu'], residual: bool = False, learning_rate=0.001, **kwargs)[source]

An Atomic Convolutional Neural Network (ACNN) for energy score prediction.

The network follows the design of a graph convolutional network but in this case the graph is represented as a 3D structure of the molecule. The objective of this model is to train models and predict energetic state starting from the spatial geometry of the model [1].

References

Examples

>>> from deepchem.models.torch_models import AtomConvModel
>>> from deepchem.data import NumpyDataset
>>> frag1_num_atoms = 100 # atoms for ligand
>>> frag2_num_atoms = 1200 # atoms for protein
>>> complex_num_atoms = frag1_num_atoms + frag2_num_atoms
>>> batch_size = 1
>>> # Initialize the model
>>> atomic_convnet = AtomConvModel(n_tasks=1,
...                                batch_size=batch_size,
...                                layer_sizes=[
...                                    10,
...                                ],
...                                frag1_num_atoms=frag1_num_atoms,
...                                frag2_num_atoms=frag2_num_atoms,
...                                complex_num_atoms=complex_num_atoms)
>>> # Creates a set of dummy features that contain the coordinate and
>>> # neighbor-list features required by the AtomicConvModel.
>>> # Preparing the dataset
>>> features = []
>>> frag1_coords = np.random.rand(frag1_num_atoms, 3)
>>> frag1_nbr_list = {i: [] for i in range(frag1_num_atoms)}
>>> frag1_z = np.random.randint(10, size=(frag1_num_atoms))
>>> frag2_coords = np.random.rand(frag2_num_atoms, 3)
>>> frag2_nbr_list = {i: [] for i in range(frag2_num_atoms)}
>>> frag2_z = np.random.randint(10, size=(frag2_num_atoms))
>>> system_coords = np.random.rand(complex_num_atoms, 3)
>>> system_nbr_list = {i: [] for i in range(complex_num_atoms)}
>>> system_z = np.random.randint(10, size=(complex_num_atoms))
>>> features.append((frag1_coords, frag1_nbr_list, frag1_z, frag2_coords, frag2_nbr_list, frag2_z, system_coords, system_nbr_list, system_z))
>>> features = np.asarray(features, dtype=object)
>>> labels = np.zeros(batch_size)
>>> train = NumpyDataset(features, labels)
>>> _ = atomic_convnet.fit(train, nb_epoch=1)
>>> preds = atomic_convnet.predict(train)
__init__(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, -1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ['relu'], residual: bool = False, learning_rate=0.001, **kwargs) None[source]

TorchModel wrapper for ACNN

Parameters:
  • n_tasks (int) – number of tasks

  • frag1_num_atoms (int) – Number of atoms in first fragment.

  • frag2_num_atoms (int) – Number of atoms in second fragment.

  • complex_num_atoms (int) – Number of atoms in complex.

  • max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors are spatial neighbors.

  • batch_size (int) – Size of the batch.

  • atom_types (list) – List of atoms recognized by model. Atoms are indicated by their nuclear numbers.

  • radial (list) – Radial parameters used in the atomic convolution transformation.

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.

  • dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.

  • residual (bool) – Whether to use residual connections.

  • learning_rate (float) – the learning rate to use for fitting.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Convert a dataset into the tensors needed for learning.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to convert

  • epochs (int, optional (Default 1)) – Number of times to walk over dataset

  • mode (str, optional (Default 'fit')) – Ignored in this implementation.

  • deterministic (bool, optional (Default True)) – Whether the dataset should be walked in a deterministic fashion

  • pad_batches (bool, optional (Default True)) – If true, each returned batch will have size self.batch_size.

Return type:

Iterator which walks over the batches

MPNNModel

Note that this is an alternative implementation for MPNN and currently you can only import it from deepchem.models.torch_models.

class MPNNModel(n_tasks: int, node_out_feats: int = 64, edge_hidden_feats: int = 128, num_step_message_passing: int = 3, num_step_set2set: int = 6, num_layer_set2set: int = 3, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = False, **kwargs)[source]

Model for graph property prediction

This model proceeds as follows:

  • Combine latest node representations and edge features in updating node representations,

    which involves multiple rounds of message passing

  • For each graph, compute its representation by combining the representations

    of all nodes in it, which involves a Set2Set layer.

  • Perform the final prediction using an MLP

Examples

>>> import deepchem as dc
>>> from deepchem.models.torch_models import MPNNModel
>>> # preparing dataset
>>> smiles = ["C1CCC1", "CCC"]
>>> labels = [0., 1.]
>>> featurizer = dc.feat.MolGraphConvFeaturizer(use_edges=True)
>>> X = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(X=X, y=labels)
>>> # training model
>>> model = MPNNModel(mode='classification', n_tasks=1,
...                  batch_size=16, learning_rate=0.001)
>>> loss =  model.fit(dataset, nb_epoch=5)

References

Notes

This class requires DGL (https://github.com/dmlc/dgl) and DGL-LifeSci (https://github.com/awslabs/dgl-lifesci) to be installed.

The featurizer used with MPNNModel must produce a GraphData object which should have both ‘edge’ and ‘node’ features.

__init__(n_tasks: int, node_out_feats: int = 64, edge_hidden_feats: int = 128, num_step_message_passing: int = 3, num_step_set2set: int = 6, num_layer_set2set: int = 3, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = False, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks.

  • node_out_feats (int) – The length of the final node representation vectors. Default to 64.

  • edge_hidden_feats (int) – The length of the hidden edge representation vectors. Default to 128.

  • num_step_message_passing (int) – The number of rounds of message passing. Default to 3.

  • num_step_set2set (int) – The number of set2set steps. Default to 6.

  • num_layer_set2set (int) – The number of set2set layers. Default to 3.

  • mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.

  • number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.

  • number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.

  • n_classes (int) – The number of classes to predict per task (only used when mode is ‘classification’). Default to 2.

  • self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. Generally, an MPNNModel does not require self loops. Default to False.

  • kwargs – This can include any keyword argument of TorchModel.

InfoGraphModel

class InfoGraphModel(num_features, embedding_dim, num_gc_layers=5, prior=True, gamma=0.1, measure='JSD', average_loss=True, task='pretraining', n_tasks: int | None = None, n_classes: int | None = None, **kwargs)[source]

InfoGraphMode

InfoGraphModel is a model which learn graph-level representation via unsupervised learning. To this end, the model aims to maximize the mutual information between the representations of entire graphs and the representations of substructures of different granularity (eg. nodes, edges, triangles)

The unsupervised training of InfoGraph involves two encoders: one that encodes the entire graph and another that encodes substructures of different sizes. The mutual information between the two encoder outputs is maximized using a contrastive loss function. The model randomly samples pairs of graphs and substructures, and then maximizes their mutual information by minimizing their distance in a learned embedding space.

This can be used for downstream tasks such as graph classification and molecular property prediction.It is implemented as a ModularTorchModel in order to facilitate transfer learning.

References

  1. Sun, F.-Y., Hoffmann, J., Verma, V. & Tang, J. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. Preprint at http://arxiv.org/abs/1908.01000 (2020).

Parameters:
  • num_features (int) – Number of node features for each input

  • edge_features (int) – Number of edge features for each input

  • embedding_dim (int) – Dimension of the embedding

  • num_gc_layers (int) – Number of graph convolutional layers

  • prior (bool) – Whether to use a prior expectation in the loss term

  • gamma (float) – Weight of the prior expectation in the loss term

  • measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.

  • average_loss (bool) – Whether to average the loss over the batch

  • n_classes (int) – Number of classses

Example

>>> from deepchem.models.torch_models.infograph import InfoGraphModel
>>> from deepchem.feat import MolGraphConvFeaturizer
>>> from deepchem.data import NumpyDataset
>>> import torch
>>> import numpy as np
>>> import tempfile
>>> tempdir = tempfile.TemporaryDirectory()
>>> smiles = ["C1CCC1", "C1=CC=CN=C1"]
>>> featurizer = MolGraphConvFeaturizer(use_edges=True)
>>> X = featurizer.featurize(smiles)
>>> y = torch.randint(0, 2, size=(2, 1)).float()
>>> w = torch.ones(size=(2, 1)).float()
>>> dataset = NumpyDataset(X, y, w)
>>> num_feat, edge_dim = 30, 11  # num feat and edge dim by molgraph conv featurizer
>>> pretrain_model = InfoGraphModel(num_feat, edge_dim, num_gc_layers=1, task='pretraining', model_dir=tempdir.name)
>>> pretraining_loss = pretrain_model.fit(dataset, nb_epoch=1)
>>> pretrain_model.save_checkpoint()
>>> finetune_model = InfoGraphModel(num_feat, edge_dim, num_gc_layers=1, task='regression', n_tasks=1, model_dir=tempdir.name)
>>> finetune_model.restore(components=['encoder'])
>>> finetuning_loss = finetune_model.fit(dataset)
>>>
>>> # classification example
>>> n_classes, n_tasks = 2, 1
>>> classification_model = InfoGraphModel(num_feat, edge_dim, num_gc_layers=1, task='classification', n_tasks=1, n_classes=2)
>>> y = np.random.randint(n_classes, size=(len(smiles), n_tasks)).astype(np.float64)
>>> dataset = NumpyDataset(X, y, w)
>>> loss = classification_model.fit(dataset, nb_epoch=1)
__init__(num_features, embedding_dim, num_gc_layers=5, prior=True, gamma=0.1, measure='JSD', average_loss=True, task='pretraining', n_tasks: int | None = None, n_classes: int | None = None, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_components() dict[source]

Build the components of the model. InfoGraph is an unsupervised molecular graph representation learning model. It consists of an encoder, a local discriminator, a global discriminator, and a prior discriminator.

The unsupervised loss is calculated by the mutual information in embedding representations at all layers.

Components list, type and description:

encoder: GINEncoder, graph convolutional encoder

local_d: MultilayerPerceptron, local discriminator

global_d: MultilayerPerceptron, global discriminator

prior_d: MultilayerPerceptron, prior discriminator fc1: MultilayerPerceptron, dense layer used during finetuning fc2: MultilayerPerceptron, dense layer used during finetuning

build_model() Module[source]

Builds the final model from the components.

loss_func(inputs, labels, weights)[source]

Defines the loss function for the model which can access the components using self.components. The loss function should take the inputs, labels, and weights as arguments and return the loss.

restore(components: List[str] | None = None, checkpoint: str | None = None, model_dir: str | None = None, map_location: device | None = None) None[source]

Restores the state of a ModularTorchModel from a checkpoint file.

If no checkpoint file is provided, it will use the latest checkpoint found in the model directory. If a list of component names is provided, only the state of those components will be restored.

Parameters:
  • components (Optional[List[str]]) – A list of component names to restore. If None, all components will be restored.

  • checkpoint (Optional[str]) – The path to the checkpoint file. If None, the latest checkpoint in the model directory will be used.

  • model_dir (Optional[str]) – The path to the model directory. If None, the model directory used to initialize the model will be used.

InfoGraphStarModel

class InfoGraphStarModel(num_features, edge_features, embedding_dim, task: Literal['supervised', 'semisupervised'] = 'supervised', mode: Literal['regression', 'classification'] = 'regression', num_classes=2, num_tasks=1, measure='JSD', average_loss=True, num_gc_layers=5, **kwargs)[source]

InfographStar is a semi-supervised graph convolutional network for predicting molecular properties. It aims to maximize the mutual information between the graph-level representation and the representations of substructures of different scales. It does this by producing graph-level encodings and substructure encodings, and then using a discriminator to classify if they are from the same molecule or not.

Supervised training is done by using the graph-level encodings to predict the target property. Semi-supervised training is done by adding a loss term that maximizes the mutual information between the graph-level encodings and the substructure encodings to the supervised loss. These modes can be chosen by setting the training_mode parameter.

To conduct training in unsupervised mode, use InfoGraphModel.

References

Parameters:
  • num_features (int) – Number of node features for each input

  • edge_features (int) – Number of edge features for each input

  • embedding_dim (int) – Dimension of the embedding

  • training_mode (str) – The mode to use for training. Options are ‘supervised’, ‘semisupervised’. For unsupervised training, use InfoGraphModel.

  • measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.

  • average_loss (bool) – Whether to average the loss over the batch

Examples

>>> from deepchem.models.torch_models import InfoGraphStarModel
>>> from deepchem.feat import MolGraphConvFeaturizer
>>> from deepchem.data import NumpyDataset
>>> import torch
>>> smiles = ["C1CCC1", "C1=CC=CN=C1"]
>>> featurizer = MolGraphConvFeaturizer(use_edges=True)
>>> X = featurizer.featurize(smiles)
>>> y = torch.randint(0, 2, size=(2, 1)).float()
>>> w = torch.ones(size=(2, 1)).float()
>>> ds = NumpyDataset(X, y, w)
>>> num_feat = max([ds.X[i].num_node_features for i in range(len(ds))])
>>> edge_dim = max([ds.X[i].num_edge_features for i in range(len(ds))])
>>> model = InfoGraphStarModel(num_feat, edge_dim, 15, training_mode='semisupervised')
>>> loss = model.fit(ds, nb_epoch=1)
__init__(num_features, edge_features, embedding_dim, task: Literal['supervised', 'semisupervised'] = 'supervised', mode: Literal['regression', 'classification'] = 'regression', num_classes=2, num_tasks=1, measure='JSD', average_loss=True, num_gc_layers=5, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_components()[source]

Builds the components of the InfoGraphStar model. InfoGraphStar works by maximizing the mutual information between the graph-level representation and the representations of substructures of different scales.

It does this by producing graph-level encodings and substructure encodings, and then using a discriminator to classify if they are from the same molecule or not.

The encoder is a graph convolutional network that produces the graph-level encodings and substructure encodings.

In a supervised training mode, only 1 encoder is used and the encodings are not compared, while in a semi-supvervised training mode they are different in order to prevent negative transfer from the pretraining stage.

The local discriminator is a multilayer perceptron that classifies if the substructure encodings are from the same molecule or not while the global discriminator classifies if the graph-level encodings are from the same molecule or not.

Components list, type and description:

encoder: InfoGraphEncoder

unsup_encoder: InfoGraphEncoder for supervised or GINEncoder for unsupervised training

ff1: MultilayerPerceptron, feedforward network

ff2: MultilayerPerceptron, feedforward network

fc1: torch.nn.Linear, fully connected layer

fc2: torch.nn.Linear, fully connected layer

local_d: MultilayerPerceptron, local discriminator

global_d: MultilayerPerceptron, global discriminator

build_model()[source]

Builds the InfoGraph model by unpacking the components dictionary and passing them to the InfoGraph nn.module.

loss_func(inputs, labels, weights)[source]

Defines the loss function for the model which can access the components using self.components. The loss function should take the inputs, labels, and weights as arguments and return the loss.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

GNNModular

class GNNModular(gnn_type: str = 'gin', num_layer: int = 3, emb_dim: int = 64, num_tasks: int = 1, num_classes: int = 2, graph_pooling: str = 'mean', dropout: int = 0, jump_knowledge: str = 'last', task: str = 'edge_pred', mask_rate: float = 0.1, mask_edge: bool = True, context_size: int = 1, neighborhood_size: int = 3, context_mode: str = 'cbow', neg_samples: int = 1, **kwargs)[source]

Modular GNN which allows for easy swapping of GNN layers.

Parameters:
  • gnn_type (str) – The type of GNN layer to use. Must be one of “gin”, “gcn”, “graphsage”, or “gat”.

  • num_layer (int) – The number of GNN layers to use.

  • emb_dim (int) – The dimensionality of the node embeddings.

  • num_tasks (int) – The number of tasks.

  • graph_pooling (str) – The type of graph pooling to use. Must be one of “sum”, “mean”, “max”, “attention” or “set2set”. “sum” may cause issues with positive prediction loss.

  • dropout (float, optional (default 0)) – The dropout probability.

  • jump_knowledge (str, optional (default "last")) – The type of jump knowledge to use. [1] Must be one of “last”, “sum”, “max”, or “concat”. “last”: Use the node representation from the last GNN layer. “concat”: Concatenate the node representations from all GNN layers. This will increase the dimensionality of the node representations by a factor of num_layer. “max”: Take the element-wise maximum of the node representations from all GNN layers. “sum”: Take the element-wise sum of the node representations from all GNN layers. This may cause issues with positive prediction loss.

  • task (str, optional (default "regression")) – The type of task. Unsupervised tasks: edge_pred: Edge prediction. Predicts whether an edge exists between two nodes. mask_nodes: Masking nodes. Predicts the masked node. mask_edges: Masking edges. Predicts the masked edge. infomax: Infomax. Maximizes mutual information between local node representations and a pooled global graph representation. context_pred: Context prediction. Predicts the surrounding context of a node. Supervised tasks: “regression” or “classification”.

  • mask_rate (float, optional (default 0.1)) – The rate at which to mask nodes or edges for mask_nodes and mask_edges tasks.

  • mask_edge (bool, optional (default True)) – Whether to also mask connecting edges for mask_nodes tasks.

  • context_size (int, optional (default 1)) – The size of the context to use for context prediction tasks.

  • neighborhood_size (int, optional (default 3)) – The size of the neighborhood to use for context prediction tasks.

  • context_mode (str, optional (default "cbow")) – The context mode to use for context prediction tasks. Must be one of “cbow” or “skipgram”.

  • neg_samples (int, optional (default 1)) – The number of negative samples to use for context prediction.

Examples

>>> import numpy as np
>>> import deepchem as dc
>>> from deepchem.feat.molecule_featurizers import SNAPFeaturizer
>>> from deepchem.models.torch_models.gnn import GNNModular
>>> featurizer = SNAPFeaturizer()
>>> smiles = ["C1=CC=CC=C1", "C1=CC=CC=C1C=O", "C1=CC=CC=C1C(=O)O"]
>>> features = featurizer.featurize(smiles)
>>> dataset = dc.data.NumpyDataset(features, np.zeros(len(features)))
>>> model = GNNModular(task="edge_pred")
>>> loss = model.fit(dataset, nb_epoch=1)

References

__init__(gnn_type: str = 'gin', num_layer: int = 3, emb_dim: int = 64, num_tasks: int = 1, num_classes: int = 2, graph_pooling: str = 'mean', dropout: int = 0, jump_knowledge: str = 'last', task: str = 'edge_pred', mask_rate: float = 0.1, mask_edge: bool = True, context_size: int = 1, neighborhood_size: int = 3, context_mode: str = 'cbow', neg_samples: int = 1, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_components()[source]

Builds the components of the GNNModular model. It initializes the encoders, batch normalization layers, pooling layers, and head layers based on the provided configuration. The method returns a dictionary containing the following components:

Components list, type and description:

node_type_embedding: torch.nn.Embedding, an embedding layer for node types.

chirality_embedding: torch.nn.Embedding, an embedding layer for chirality tags.

gconvs: torch_geometric.nn.conv.MessagePassing, a list of graph convolutional layers (encoders) based on the specified GNN type (GIN, GCN, or GAT).

batch_norms: torch.nn.BatchNorm1d, a list of batch normalization layers corresponding to the encoders.

pool: Union[function,torch_geometric.nn.aggr.Aggregation], a pooling layer based on the specified graph pooling type (sum, mean, max, attention, or set2set).

head: nn.Linear, a linear layer for the head of the model.

These components are then used to construct the GNN and GNN_head modules for the GNNModular model.

build_gnn(num_layer)[source]

Build graph neural network encoding layers by specifying the number of GNN layers.

Parameters:

num_layer (int) – The number of GNN layers to be created.

Returns:

A tuple containing two ModuleLists: 1. encoders: A ModuleList of GNN layers (currently only GIN is supported). 2. batch_norms: A ModuleList of batch normalization layers corresponding to each GNN layer.

Return type:

tuple of (torch.nn.ModuleList, torch.nn.ModuleList)

build_model()[source]

Builds the appropriate model based on the specified task.

For the edge prediction task, the model is simply the GNN module because it is an unsupervised task and does not require a prediction head.

Supervised tasks such as node classification and graph regression require a prediction head, so the model is a sequential module consisting of the GNN module followed by the GNN_head module.

loss_func(inputs, labels, weights)[source]

The loss function executed in the training loop, which is based on the specified task.

masked_node_loss_loader(inputs)[source]

Produces the loss between the predicted node features and the true node features for masked nodes. Set mask_edge to True to also predict the edge types for masked edges.

masked_edge_loss_loader(inputs)[source]

Produces the loss between the predicted edge types and the true edge types for masked edges.

infomax_loss_loader(inputs)[source]

Loss that maximizes mutual information between local node representations and a pooled global graph representation. The positive and negative scores represent the similarity between local node representations and global graph representations of simlar and dissimilar graphs, respectively.

Parameters:

inputs (BatchedGraphData) – BatchedGraphData object containing the node features, edge indices, and graph indices for the batch of graphs.

context_pred_loss_loader(inputs)[source]

Loads the context prediction loss for the given input by taking the batched subgraph and context graphs and computing the context prediction loss for each subgraph and context graph pair.

Parameters:

inputs (tuple) – A tuple containing the following elements: - substruct_batch (BatchedGraphData): Batched subgraph, or neighborhood, graphs. - s_overlap (List[int]): List of overlapping subgraph node indices between the subgraph and context graphs. - context_graphs (BatchedGraphData): Batched context graphs. - c_overlap (List[int]): List of overlapping context node indices between the subgraph and context graphs. - overlap_size (List[int]): List of the number of overlapping nodes between the subgraph and context graphs.

Returns:

context_pred_loss – The context prediction loss

Return type:

torch.Tensor

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

This default generator is modified from the default generator in dc.models.tensorgraph.tensor_graph.py to support multitask classification. If the task is classification, the labels y_b are converted to a one-hot encoding and reshaped according to the number of tasks and classes.

InfoMax3DModular

class InfoMax3DModular(task: Literal['pretraining', 'regression', 'classification'], hidden_dim: int = 64, target_dim: int = 10, aggregators: List[str] = ['mean'], readout_aggregators: List[str] = ['mean'], scalers: List[str] = ['identity'], residual: bool = True, node_wise_output_layers: int = 2, pairwise_distances: bool = False, activation: Callable | str = 'relu', reduce_func: str = 'sum', batch_norm: bool = True, batch_norm_momentum: float = 0.1, propagation_depth: int = 5, dropout: float = 0.0, readout_layers: int = 2, readout_hidden_dim: int = 1, fourier_encodings: int = 4, update_net_layers: int = 2, message_net_layers: int = 2, use_node_features: bool = False, posttrans_layers: int = 1, pretrans_layers: int = 1, n_tasks: int = 1, n_classes: bool | None = None, **kwargs)[source]

InfoMax3DModular is a modular torch model that uses a 2D PNA model and a 3D Net3D model to maximize the mutual information between their representations. The 2D model can then be used for downstream tasks without the need for 3D coordinates. This is based off the work in [1].

This class expects data in featurized by the RDKitConformerFeaturizer. This featurizer produces features of the type Array[Array[List[GraphData]]]. The outermost array is the dataset array, the second array is the molecule, the list contains the conformers for that molecule and the GraphData object is the featurized graph for that conformer with node_pos_features holding the 3D coordinates. If you are not using RDKitConformerFeaturizer, your input data features should look like this: Dataset[Molecule[Conformers[GraphData]]].

For pretraining, the original paper used a learning rate of 8e-5 with a batch size of 500. For finetuning on quantum mechanical datasets, a learning rate of 7e-5 with a batch size of 128 was used. For finetuning on non-quantum mechanical datasets, a learning rate of 1e-3 with a batch size of 32 was used in the original implementation.

Parameters:
  • task (Literal['pretrain', 'regression', 'classification']) – The task of the model

  • hidden_dim (int, optional, default = 64) – The dimension of the hidden layers.

  • target_dim (int, optional, default = 10) – The dimension of the output layer.

  • aggregators (List[str]) – A list of aggregator functions for the PNA model. Options are ‘mean’, ‘sum’, ‘min’, ‘max’, ‘std’, ‘var’, ‘moment3’, ‘moment4’, ‘moment5’.

  • readout_aggregators (List[str]) – A list of aggregator functions for the readout layer. Options are ‘sum’, ‘max’, ‘min’, ‘mean’.

  • scalers (List[str]) – A list of scaler functions for the PNA model. Options are ‘identity’, ‘amplification’, ‘attenuation’.

  • residual (bool, optional (default=True)) – Whether to use residual connections in the PNA model.

  • node_wise_output_layers (int, optional (default=2)) – The number of output layers for each node in the Net3D model.

  • pairwise_distances (bool, optional (default=False)) – Whether to use pairwise distances in the PNA model.

  • activation (Union[Callable, str], optional (default="relu")) – The activation function to use in the PNA model.

  • reduce_func (str, optional (default='sum')) – The reduce function to use for aggregating messages in the Net3D model.

  • batch_norm (bool, optional (default=True)) – Whether to use batch normalization in the PNA model.

  • batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.

  • propagation_depth (int, optional (default=5)) – The number of propagation layers in the PNA and Net3D models.

  • dropout (float, optional (default=0.0)) – The dropout rate for the layers in the PNA and Net3D models.

  • readout_layers (int, optional (default=2)) – The number of readout layers in the PNA and Net3D models.

  • readout_hidden_dim (int, optional (default=None)) – The dimension of the hidden layers in the readout network.

  • fourier_encodings (int, optional (default=4)) – The number of Fourier encodings to use in the Net3D model.

  • update_net_layers (int, optional (default=2)) – The number of update network layers in the Net3D model.

  • message_net_layers (int, optional (default=2)) – The number of message network layers in the Net3D model.

  • use_node_features (bool, optional (default=False)) – Whether to use node features as input in the Net3D model.

  • posttrans_layers (int, optional (default=1)) – The number of post-transformation layers in the PNA model.

  • pretrans_layers (int, optional (default=1)) – The number of pre-transformation layers in the PNA model.

  • kwargs (dict) – Additional keyword arguments.

References

Examples

>>> from deepchem.feat.graph_data import BatchGraphData
>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import RDKitConformerFeaturizer
>>> from deepchem.models.torch_models.gnn3d import InfoMax3DModular
>>> import numpy as np
>>> import deepchem as dc
>>> from deepchem.data.datasets import NumpyDataset
>>> smiles = ["C[C@H](F)Cl", "C[C@@H](F)Cl"]
>>> featurizer = RDKitConformerFeaturizer()
>>> data = featurizer.featurize(smiles)
>>> dataset = NumpyDataset(X=data)
>>> model = InfoMax3DModular(task='pretraining',
...                          hidden_dim=64,
...                          target_dim=10,
...                          aggregators=['max'],
...                          readout_aggregators=['mean'],
...                          scalers=['identity'])
>>> loss = model.fit(dataset, nb_epoch=1)
__init__(task: Literal['pretraining', 'regression', 'classification'], hidden_dim: int = 64, target_dim: int = 10, aggregators: List[str] = ['mean'], readout_aggregators: List[str] = ['mean'], scalers: List[str] = ['identity'], residual: bool = True, node_wise_output_layers: int = 2, pairwise_distances: bool = False, activation: Callable | str = 'relu', reduce_func: str = 'sum', batch_norm: bool = True, batch_norm_momentum: float = 0.1, propagation_depth: int = 5, dropout: float = 0.0, readout_layers: int = 2, readout_hidden_dim: int = 1, fourier_encodings: int = 4, update_net_layers: int = 2, message_net_layers: int = 2, use_node_features: bool = False, posttrans_layers: int = 1, pretrans_layers: int = 1, n_tasks: int = 1, n_classes: bool | None = None, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_components()[source]

Build the components of the InfoMax3DModular model.

Returns:

A dictionary containing the ‘2d’ PNA model and the ‘3d’ Net3D model.

Return type:

dict

build_model()[source]

Build the InfoMax3DModular model. This is the 2D network which is meant to be used for inference.

Returns:

The 2D PNA model component.

Return type:

PNA

loss_func(inputs, labels, weights)[source]

Compute the loss function for the InfoMax3DModular model.

Parameters:
  • inputs (dgl.DGLGraph) – The input graph with node features stored under the key ‘x’ and edge distances stored under the key ‘d’.

  • labels (torch.Tensor) – The ground truth labels.

  • weights (torch.Tensor) – The weights for each sample.

Returns:

The computed loss value.

Return type:

torch.Tensor

LCNNModel

class LCNNModel(n_occupancy: int = 3, n_neighbor_sites_list: int = 19, n_permutation_list: int = 6, n_task: int = 1, dropout_rate: float = 0.4, n_conv: int = 2, n_features: int = 44, sitewise_n_feature: int = 25, **kwargs)[source]

Lattice Convolutional Neural Network (LCNN). Here is a simple example of code that uses the LCNNModel with Platinum 2d Adsorption dataset.

This model takes arbitrary configurations of Molecules on an adsorbate and predicts their formation energy. These formation energies are found using DFT calculations and LCNNModel is to automate that process. This model defines a crystal graph using the distance between atoms. The crystal graph is an undirected regular graph (equal neighbours) and different permutations of the neighbours are pre-computed using the LCNNFeaturizer. On each node for each permutation, the neighbour nodes are concatenated which are further operated. This model has only a node representation. Please confirm the detail algorithms from [1]_.

Examples

>>>
>> import deepchem as dc
>> from pymatgen.core import Structure
>> import numpy as np
>> from deepchem.feat import LCNNFeaturizer
>> from deepchem.molnet import load_Platinum_Adsorption
>> PRIMITIVE_CELL = {
..   "lattice": [[2.818528, 0.0, 0.0],
..               [-1.409264, 2.440917, 0.0],
..               [0.0, 0.0, 25.508255]],
..   "coords": [[0.66667, 0.33333, 0.090221],
..              [0.33333, 0.66667, 0.18043936],
..              [0.0, 0.0, 0.27065772],
..              [0.66667, 0.33333, 0.36087608],
..              [0.33333, 0.66667, 0.45109444],
..              [0.0, 0.0, 0.49656991]],
..   "species": ['H', 'H', 'H', 'H', 'H', 'He'],
..   "site_properties": {'SiteTypes': ['S1', 'S1', 'S1', 'S1', 'S1', 'A1']}
.. }
>> PRIMITIVE_CELL_INF0 = {
..    "cutoff": np.around(6.00),
..    "structure": Structure(**PRIMITIVE_CELL),
..    "aos": ['1', '0', '2'],
..    "pbc": [True, True, False],
..    "ns": 1,
..    "na": 1
.. }
>> tasks, datasets, transformers = load_Platinum_Adsorption(
..    featurizer= LCNNFeaturizer( **PRIMITIVE_CELL_INF0)
.. )
>> train, val, test = datasets
>> model = LCNNModel(mode='regression',
..                   batch_size=8,
..                   learning_rate=0.001)
>> model = LCNN()
>> out = model(lcnn_feat)
>> model.fit(train, nb_epoch=10)

References

Notes

This class requires DGL and PyTorch to be installed.

__init__(n_occupancy: int = 3, n_neighbor_sites_list: int = 19, n_permutation_list: int = 6, n_task: int = 1, dropout_rate: float = 0.4, n_conv: int = 2, n_features: int = 44, sitewise_n_feature: int = 25, **kwargs)[source]

This class accepts all the keyword arguments from TorchModel.

Parameters:
  • n_occupancy (int, default 3) – number of possible occupancy.

  • n_neighbor_sites_list (int, default 19) – Number of neighbors of each site.

  • n_permutation (int, default 6) – Diffrent permutations taken along diffrent directions.

  • n_task (int, default 1) – Number of tasks.

  • dropout_rate (float, default 0.4) – p value for dropout between 0.0 to 1.0

  • nconv (int, default 2) – number of convolutions performed.

  • n_feature (int, default 44) – number of feature for each site.

  • sitewise_n_feature (int, default 25) – number of features for atoms for site-wise activation.

  • kwargs (Dict) – This class accepts all the keyword arguments from TorchModel.

MEGNetModel

class MEGNetModel(n_node_features: int = 32, n_edge_features: int = 32, n_global_features: int = 32, n_blocks: int = 1, is_undirected: bool = True, residual_connection: bool = True, mode: str = 'regression', n_classes: int = 2, n_tasks: int = 1, **kwargs)[source]

MatErials Graph Network for Molecules and Crystals

MatErials Graph Network [1]_ are Graph Networks [2]_ which are used for property prediction in molecules and crystals. The model implements multiple layers of Graph Network as MEGNetBlocks and then combines the node properties and edge properties of all nodes and edges via a Set2Set layer. The combines information is used with the global features of the material/molecule for property prediction tasks.

Example

>>> import deepchem as dc
>>> from deepchem.models import MEGNetModel
>>> from deepchem.utils.fake_data_generator import FakeGraphGenerator as FGG
>>> graphs = FGG(global_features=4, num_classes=10).sample(n_graphs=20)
>>> model = dc.models.MEGNetModel(n_node_features=5, n_edge_features=3, n_global_features=4, n_blocks=3, is_undirected=True, residual_connection=True, mode='classification', n_classes=10, batch_size=16)
>>> training_loss = model.fit(graphs)

References

Note

The model requires PyTorch-Geometric to be installed.

__init__(n_node_features: int = 32, n_edge_features: int = 32, n_global_features: int = 32, n_blocks: int = 1, is_undirected: bool = True, residual_connection: bool = True, mode: str = 'regression', n_classes: int = 2, n_tasks: int = 1, **kwargs)[source]
Parameters:
  • n_node_features (int) – Number of features in a node

  • n_edge_features (int) – Number of features in a edge

  • n_global_features (int) – Number of global features

  • n_blocks (int) – Number of GraphNetworks block to use in update

  • is_undirected (bool, optional (default True)) – True when the model is used on undirected graphs otherwise false

  • residual_connection (bool, optional (default True)) – If True, the layer uses a residual connection during training

  • n_tasks (int, default 1) – The number of tasks

  • mode (str, default 'regression') – The model type - classification or regression

  • n_classes (int, default 2) – The number of classes to predict (used only in classification mode).

  • kwargs (Dict) – kwargs supported by TorchModel

MATModel

class MATModel(dist_kernel: str = 'softmax', n_encoders=8, lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, sa_hsize: int = 1024, sa_dropout_p: float = 0.0, output_bias: bool = True, d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, ff_dropout_p: float = 0.0, encoder_hsize: int = 1024, encoder_dropout_p: float = 0.0, embed_input_hsize: int = 36, embed_dropout_p: float = 0.0, gen_aggregation_type: str = 'mean', gen_dropout_p: float = 0.0, gen_n_layers: int = 1, gen_attn_hidden: int = 128, gen_attn_out: int = 4, gen_d_output: int = 1, **kwargs)[source]

Molecular Attention Transformer.

This class implements the Molecular Attention Transformer [1]_. The MATFeaturizer (deepchem.feat.MATFeaturizer) is intended to work with this class. The model takes a batch of MATEncodings (from MATFeaturizer) as input, and returns an array of size Nx1, where N is the number of molecules in the batch. Each molecule is broken down into its Node Features matrix, adjacency matrix and distance matrix. A mask tensor is calculated for the batch. All of this goes as input to the MATEmbedding, MATEncoder and MATGenerator layers, which are defined in deepchem.models.torch_models.layers.py

Currently, MATModel is intended to be a regression model for the freesolv dataset.

References

Examples

>>> import deepchem as dc
>>> import pandas as pd
>>> smiles = ['CC', 'CCC',  'CCCC', 'CCCCC', 'CCCCCCC']
>>> vals = [1.35, 6.72, 5.67, 1.23, 1.76]
>>> df = pd.DataFrame(list(zip(smiles, vals)), columns = ['smiles', 'y'])
>>> loader = dc.data.CSVLoader(tasks=['y'], feature_field='smiles', featurizer=dc.feat.MATFeaturizer())
>>> df.to_csv('test.csv')
>>> dataset = loader.create_dataset('test.csv')
>>> model = dc.models.torch_models.MATModel(batch_size = 2)
>>> out = model.fit(dataset, nb_epoch = 1)
__init__(dist_kernel: str = 'softmax', n_encoders=8, lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, sa_hsize: int = 1024, sa_dropout_p: float = 0.0, output_bias: bool = True, d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, ff_dropout_p: float = 0.0, encoder_hsize: int = 1024, encoder_dropout_p: float = 0.0, embed_input_hsize: int = 36, embed_dropout_p: float = 0.0, gen_aggregation_type: str = 'mean', gen_dropout_p: float = 0.0, gen_n_layers: int = 1, gen_attn_hidden: int = 128, gen_attn_out: int = 4, gen_d_output: int = 1, **kwargs)[source]

The wrapper class for the Molecular Attention Transformer.

Since we are using a custom data class as input (MATEncoding), we have overriden the default_generator function from DiskDataset and customized it to work with a batch of MATEncoding classes.

Parameters:
  • dist_kernel (str) – Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential, for the self-attention layer.

  • n_encoders (int) – Number of encoder layers in the encoder block.

  • lambda_attention (float) – Constant to be multiplied with the attention matrix in the self-attention layer.

  • lambda_distance (float) – Constant to be multiplied with the distance matrix in the self-attention layer.

  • h (int) – Number of attention heads for the self-attention layer.

  • sa_hsize (int) – Size of dense layer in the self-attention layer.

  • sa_dropout_p (float) – Dropout probability for the self-attention layer.

  • output_bias (bool) – If True, dense layers will use bias vectors in the self-attention layer.

  • d_input (int) – Size of input layer in the feed-forward layer.

  • d_hidden (int) – Size of hidden layer in the feed-forward layer. Will also be used as d_output for the MATEmbedding layer.

  • d_output (int) – Size of output layer in the feed-forward layer.

  • activation (str) – Activation function to be used in the feed-forward layer. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.

  • n_layers (int) – Number of layers in the feed-forward layer.

  • ff_dropout_p (float) – Dropout probability in the feeed-forward layer.

  • encoder_hsize (int) – Size of Dense layer for the encoder itself.

  • encoder_dropout_p (float) – Dropout probability for connections in the encoder layer.

  • embed_input_hsize (int) – Size of input layer for the MATEmbedding layer.

  • embed_dropout_p (float) – Dropout probability for the MATEmbedding layer.

  • gen_aggregation_type (str) – Type of aggregation to be used. Can be ‘grover’, ‘mean’ or ‘contextual’.

  • gen_dropout_p (float) – Dropout probability for the MATGenerator layer.

  • gen_n_layers (int) – Number of layers in MATGenerator.

  • gen_attn_hidden (int) – Size of hidden attention layer in the MATGenerator layer.

  • gen_attn_out (int) – Size of output attention layer in the MATGenerator layer.

  • gen_d_output (int) – Size of output layer in the MATGenerator layer.

pad_array(array: ndarray, shape: Any) ndarray[source]

Pads an array to the desired shape.

Parameters:
  • array (np.ndarray) –

  • padded. (Array to be) –

  • shape (int or Tuple) –

  • to. (Shape the array is padded) –

Returns:

  • array (np.ndarray)

  • Array padded to input shape.

pad_sequence(sequence: ndarray) ndarray[source]

Pads a given sequence using the pad_array function.

Parameters:
  • sequence (np.ndarray) –

  • sequence. (Arrays in this sequence are padded to the largest shape in the) –

Returns:

  • array (np.ndarray)

  • Sequence with padded arrays.

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True, **kwargs)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

NormalizingFlowModel

class NormalizingFlowModel(dim: int, num_layers: int, flowList: List[Flow] | None = None, device: device | None = None)[source]

Normalizing Flow Model

The Normalizing Flow model is a generative model that learns a target distribution by transforming a simple base distribution through a series of invertible transformations. The target distribution is then defined as the composition of the base distribution and the flow transformations.

Examples

>>> import torch
>>> from deepchem.models.torch_models.flows import NormalizingFlowModel
>>> nfmodel = NormalizingFlowModel(4, 2)
>>> onehots = [[0, 1, 1, 0], [0, 1, 0, 1]]
>>> input_tensor = torch.tensor(onehots)
>>> noise_tensor = torch.rand(input_tensor.shape)
>>> data = torch.add(input_tensor, noise_tensor)
>>> nfmodel.fit(data, epochs=10, learning_rate=0.001, weight_decay=0.0001)
>>> gen_mols, _ = nfmodel.nfm.sample(10)
>>> len(gen_mols)
10
__init__(dim: int, num_layers: int, flowList: List[Flow] | None = None, device: device | None = None)[source]

Initializes the Normalizing Flow model

Parameters:
  • dim (int) – Dimension of the dataset

  • num_layers (int) – Number of layers in the model

  • flowList (Optional[List[Flow]], optional) – List of Flow layers, by default None, that can be provided to the model to override the default layers

fit(dataset: ~torch.Tensor, epochs: int = 10, learning_rate: float = 0.001, weight_decay: float = 0.0001, optimizer: ~typing.Any = <class 'torch.optim.adam.Adam'>, logging: bool = False) None[source]

Fit the Normalizing Flow

Parameters:
  • dataset (torch.Tensor) – The dequantized dataset

  • epochs (int, optional) – number of epochs for training, by default 10

  • learning_rate (float, optional) – Learning rate for the optimizer, by default 0.001

  • weight_decay (float, optional) – Weight decay for the optimizer, by default 0.0001

  • optimizer (torch.optim.Optimizer, optional) – Optimizer for the model, by default torch.optim.Adam

DMPNNModel

class DMPNNModel(mode: str = 'regression', n_classes: int = 3, n_tasks: int = 1, batch_size: int = 1, global_features_size: int = 0, use_default_fdim: bool = True, atom_fdim: int = 133, bond_fdim: int = 14, enc_hidden: int = 300, depth: int = 3, bias: bool = False, enc_activation: str = 'relu', enc_dropout_p: float = 0.0, aggregation: str = 'mean', aggregation_norm: int | float = 100, ffn_hidden: int = 300, ffn_activation: str = 'relu', ffn_layers: int = 3, ffn_dropout_p: float = 0.0, ffn_dropout_at_input_no_act: bool = True, **kwargs)[source]

Directed Message Passing Neural Network

This class implements the Directed Message Passing Neural Network (D-MPNN) [1]_.

The DMPNN model has 2 phases, message-passing phase and read-out phase.

  • The goal of the message-passing phase is to generate ‘hidden states of all the atoms in the molecule’ using encoders.

  • Next in read-out phase, the features are passed into feed-forward neural network to get the task-based prediction.

For additional information:

Example

>>> import deepchem as dc
>>> import os
>>> model_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
>>> input_file = os.path.join(model_dir, 'tests/assets/freesolv_sample_5.csv')
>>> loader = dc.data.CSVLoader(tasks=['y'], feature_field='smiles', featurizer=dc.feat.DMPNNFeaturizer())
>>> dataset = loader.create_dataset(input_file)
>>> model = DMPNNModel()
>>> out = model.fit(dataset, nb_epoch=1)

References

__init__(mode: str = 'regression', n_classes: int = 3, n_tasks: int = 1, batch_size: int = 1, global_features_size: int = 0, use_default_fdim: bool = True, atom_fdim: int = 133, bond_fdim: int = 14, enc_hidden: int = 300, depth: int = 3, bias: bool = False, enc_activation: str = 'relu', enc_dropout_p: float = 0.0, aggregation: str = 'mean', aggregation_norm: int | float = 100, ffn_hidden: int = 300, ffn_activation: str = 'relu', ffn_layers: int = 3, ffn_dropout_p: float = 0.0, ffn_dropout_at_input_no_act: bool = True, **kwargs)[source]

Initialize the DMPNNModel class.

Parameters:
  • mode (str, default 'regression') – The model type - classification or regression.

  • n_classes (int, default 3) – The number of classes to predict (used only in classification mode).

  • n_tasks (int, default 1) – The number of tasks.

  • batch_size (int, default 1) – The number of datapoints in a batch.

  • global_features_size (int, default 0) – Size of the global features vector, based on the global featurizers used during featurization.

  • use_default_fdim (bool) – If True, self.atom_fdim and self.bond_fdim are initialized using values from the GraphConvConstants class. If False, self.atom_fdim and self.bond_fdim are initialized from the values provided.

  • atom_fdim (int) – Dimension of atom feature vector.

  • bond_fdim (int) – Dimension of bond feature vector.

  • enc_hidden (int) – Size of hidden layer in the encoder layer.

  • depth (int) – No of message passing steps.

  • bias (bool) – If True, dense layers will use bias vectors.

  • enc_activation (str) – Activation function to be used in the encoder layer. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.

  • enc_dropout_p (float) – Dropout probability for the encoder layer.

  • aggregation (str) – Aggregation type to be used in the encoder layer. Can choose between ‘mean’, ‘sum’, and ‘norm’.

  • aggregation_norm (Union[int, float]) – Value required if aggregation type is ‘norm’.

  • ffn_hidden (int) – Size of hidden layer in the feed-forward network layer.

  • ffn_activation (str) – Activation function to be used in feed-forward network layer. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.

  • ffn_layers (int) – Number of layers in the feed-forward network layer.

  • ffn_dropout_p (float) – Dropout probability for the feed-forward network layer.

  • ffn_dropout_at_input_no_act (bool) – If true, dropout is applied on the input tensor. For single layer, it is not passed to an activation function.

  • kwargs (Dict) – kwargs supported by TorchModel

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = False, **kwargs) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Overrides the existing default_generator method to customize how model inputs are generated from the data.

Here, the _MapperDMPNN helper class is used, for each molecule in a batch, to get required input parameters:

  • atom_features

  • f_ini_atoms_bonds

  • atom_to_incoming_bonds

  • mapping

  • global_features

Then data from each molecule is converted to a _ModData object and stored as list of graphs. The graphs are modified such that all tensors have same size in 0th dimension. (important requirement for batching)

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

  • Here, [inputs] is list of graphs.

GroverModel

class GroverModel(node_fdim: int, edge_fdim: int, hidden_size: int, self_attention=False, features_only=False, atom_vocab: GroverAtomVocabularyBuilder | None = None, bond_vocab: GroverBondVocabularyBuilder | None = None, functional_group_size: int | None = 85, features_dim: int = 128, dropout: float = 0.2, activation: str = 'relu', task: str = 'pretraining', ffn_num_layers: int = 1, ffn_hidden_size: int = 64, attn_out_size: int = 16, num_attn_heads: int = 4, depth: int = 1, mode: str | None = None, model_dir=None, n_tasks: int = 1, n_classes: int | None = None, **kwargs)[source]

GROVER model

The GROVER model employs a self-supervised message passing transformer architecutre for learning molecular representation. The pretraining task can learn rich structural and semantic information of molecules from unlabelled molecular data, which can be leveraged by finetuning for downstream applications. To this end, GROVER integrates message passing networks into a transformer style architecture.

Parameters:
  • node_fdim (int) – the dimension of additional feature for node/atom.

  • edge_fdim (int) – the dimension of additional feature for edge/bond.

  • atom_vocab (GroverAtomVocabularyBuilder) – Grover atom vocabulary builder required during pretraining.

  • bond_vocab (GroverBondVocabularyBuilder) – Grover bond vocabulary builder required during pretraining.

  • hidden_size (int) – Size of hidden layers

  • features_only (bool) – Uses only additional features in the feed-forward network, no graph network

  • self_attention (bool, default False) – When set to True, a self-attention layer is used during graph readout operation.

  • functional_group_size (int (default: 85)) – Size of functional group used in grover.

  • features_dim (int) – Size of additional molecular features, like fingerprints.

  • ffn_num_layers (int (default: 1)) – Number of linear layers to use for feature extraction from embeddings

  • ffn_hidden_size (int (default: 64)) – Hidden size of feed forward network

  • attn_out_size (int (default: 16)) – Size of attention heads

  • num_attn_heads (int (default: 4)) – Number of attention heads

  • task (str (pretraining or finetuning)) – Pretraining or finetuning tasks.

  • mode (str (classification or regression)) – Training mode (used only for finetuning)

  • n_tasks (int, optional (default: 1)) – Number of tasks

  • n_classes (int, optiona (default: 2)) – Number of target classes in classification mode

  • model_dir (str) – Directory to save model checkpoints

  • dropout (float, optional (default: 0.2)) – dropout value

  • activation (str, optional (default: 'relu')) – supported activation function

  • depth (int (default: 1)) – Dynamic message passing depth for use in MPNEncoder

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.grover import GroverModel
>>> from deepchem.feat.vocabulary_builders import (GroverAtomVocabularyBuilder, GroverBondVocabularyBuilder)
>>> import pandas as pd
>>> import os
>>> import tempfile
>>> tmpdir = tempfile.mkdtemp()
>>> df = pd.DataFrame({'smiles': ['CC', 'CCC'], 'preds': [0, 0]})
>>> filepath = os.path.join(tmpdir, 'example.csv')
>>> df.to_csv(filepath, index=False)
>>> dataset_path = os.path.join(filepath)
>>> loader = dc.data.CSVLoader(tasks=['preds'], featurizer=dc.feat.DummyFeaturizer(), feature_field=['smiles'])
>>> dataset = loader.create_dataset(filepath)
>>> av = GroverAtomVocabularyBuilder()
>>> av.build(dataset)
>>> bv = GroverBondVocabularyBuilder()
>>> bv.build(dataset)
>>> fg = dc.feat.CircularFingerprint()
>>> loader2 = dc.data.CSVLoader(tasks=['preds'], featurizer=dc.feat.GroverFeaturizer(features_generator=fg), feature_field='smiles')
>>> graph_data = loader2.create_dataset(filepath)
>>> model = GroverModel(node_fdim=151, edge_fdim=165, atom_vocab=av, bond_vocab=bv, features_dim=2048, hidden_size=128, functional_group_size=85, mode='regression', task='finetuning', model_dir='gm')
>>> loss = model.fit(graph_data, nb_epoch=1)

Reference

__init__(node_fdim: int, edge_fdim: int, hidden_size: int, self_attention=False, features_only=False, atom_vocab: GroverAtomVocabularyBuilder | None = None, bond_vocab: GroverBondVocabularyBuilder | None = None, functional_group_size: int | None = 85, features_dim: int = 128, dropout: float = 0.2, activation: str = 'relu', task: str = 'pretraining', ffn_num_layers: int = 1, ffn_hidden_size: int = 64, attn_out_size: int = 16, num_attn_heads: int = 4, depth: int = 1, mode: str | None = None, model_dir=None, n_tasks: int = 1, n_classes: int | None = None, **kwargs)[source]

Create a ModularTorchModel.

Parameters:
  • model (nn.Module) – The model to be trained.

  • components (dict) – A dictionary of the components of the model. The keys are the names of the components and the values are the components themselves.

build_components()[source]

Builds components for grover pretraining and finetuning model.

Components of pretraining model

Component name

Type

Description

embedding

Graph message passing network

A layer which accepts a molecular graph and produces an embedding for grover pretraining task

atom_vocab_task_atom

Feed forward layer

A layer which accepts an embedding generated from atom hidden states and predicts atom vocabulary for grover pretraining task

atom_vocab_task_bond

Feed forward layer

A layer which accepts an embedding generated from bond hidden states and predicts atom vocabulary for grover pretraining task

bond_vocab_task_atom

Feed forward layer

A layer which accepts an embedding generated from atom hidden states and predicts bond vocabulary for grover pretraining task

bond_vocab_task_bond

Feed forward layer

A layer which accepts an embedding generated from bond hidden states and predicts bond vocabulary for grover pretraining task

functional_group_predictor

Feed forward layer

A layer which accepts an embedding generated from a graph readout and predicts functional group for grover pretraining task

Components of finetuning model

Component name

Type

Description

embedding

Graph message passing network

An embedding layer to generate embedding from input molecular graph

readout

Feed forward layer

A readout layer to perform readout atom and bond hidden states

mol_atom_from_atom_ffn

Feed forward layer

A feed forward network which learns representation from atom messages generated via atom hidden states of a molecular graph

mol_atom_from_bond_ffn

Feed forward layer

A feed forward network which learns representation from atom messages generated via bond hidden states of a molecular graph

build_model()[source]

Builds grover pretrain or finetune model based on task

get_loss_func()[source]

Returns loss function based on task

loss_func(inputs, labels, weights)[source]

Returns loss function which performs forward iteration based on task type

static atom_vocab_random_mask(atom_vocab: GroverAtomVocabularyBuilder, smiles: List[str]) List[int][source]

Random masking of atom labels from vocabulary

For every atom in the list of SMILES string, the algorithm fetches the atoms context (vocab label) from the vocabulary provided and returns the vocabulary labels with a random masking (probability of masking = 0.15).

Parameters:
Returns:

vocab_label – atom vocab label with random masking

Return type:

List[int]

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.grover import GroverModel
>>> from deepchem.feat.vocabulary_builders import GroverAtomVocabularyBuilder
>>> smiles = np.array(['CC', 'CCC'])
>>> dataset = dc.data.NumpyDataset(X=smiles)
>>> atom_vocab = GroverAtomVocabularyBuilder()
>>> atom_vocab.build(dataset)
>>> vocab_labels = GroverModel.atom_vocab_random_mask(atom_vocab, smiles)
static bond_vocab_random_mask(bond_vocab: GroverBondVocabularyBuilder, smiles: List[str]) List[int][source]

Random masking of bond labels from bond vocabulary

For every bond in the list of SMILES string, the algorithm fetches the bond context (vocab label) from the vocabulary provided and returns the vocabulary labels with a random masking (probability of masking = 0.15).

Parameters:
  • bond_vocab (GroverBondVocabularyBuilder) – bond vocabulary

  • smiles (List[str]) – a list of smiles string

Returns:

vocab_label – bond vocab label with random masking

Return type:

List[int]

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.grover import GroverModel
>>> from deepchem.feat.vocabulary_builders import GroverBondVocabularyBuilder
>>> smiles = np.array(['CC', 'CCC'])
>>> dataset = dc.data.NumpyDataset(X=smiles)
>>> bond_vocab = GroverBondVocabularyBuilder()
>>> bond_vocab.build(dataset)
>>> vocab_labels = GroverModel.bond_vocab_random_mask(bond_vocab, smiles)
restore(checkpoint: str | None = None, model_dir: str | None = None) None[source]

Reload the values of all variables from a checkpoint file.

Parameters:
  • checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.

  • model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If checkpoint is not None, this is ignored.

DTNNModel

class DTNNModel(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2, **kwargs)[source]

Implements DTNN models for regression.

DTNN is based on the many-body Hamiltonian concept, which is a fundamental principle in quantum mechanics. DTNN recieves a molecule’s distance matrix and membership of its atom from its Coulomb Matrix representation. Then, it iteratively refines the representation of each atom by considering its interactions with neighboring atoms. Finally, it predicts the energy of the molecule by summing up the energies of the individual atoms.

This class implements the Deep Tensor Neural Network (DTNN) [1]_.

Examples

>>> import os
>>> from deepchem.data import SDFLoader
>>> from deepchem.feat import CoulombMatrix
>>> from deepchem.models.torch_models import DTNNModel
>>> model_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
>>> dataset_file = os.path.join(model_dir, 'tests/assets/qm9_mini.sdf')
>>> TASKS = ["alpha", "homo"]
>>> loader = SDFLoader(tasks=TASKS, featurizer=CoulombMatrix(29), sanitize=True)
>>> data = loader.create_dataset(dataset_file, shard_size=100)
>>> n_tasks = data.y.shape[1]
>>> model = DTNNModel(n_tasks,
...                   n_embedding=20,
...                   n_distance=100,
...                   learning_rate=1.0,
...                   mode="regression")
>>> loss = model.fit(data, nb_epoch=250)
>>> pred = model.predict(data)

References

__init__(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_embedding (int (default 30)) – Number of features per atom.

  • n_hidden (int (default 100)) – Number of features for each molecule after DTNNStep

  • n_distance (int (default 100)) – granularity of distance matrix step size will be (distance_max-distance_min)/n_distance

  • distance_min (float (default -1)) – minimum distance of atom pairs (in Angstrom)

  • distance_max (float (default = 18)) – maximum distance of atom pairs (in Angstrom)

  • output_activation (bool (default True)) – determines whether an activation function should be apply to its output.

  • mode (str (default "regression")) – Only “regression” is currently supported.

  • dropout (float (default 0.0)) – the dropout probablity to use.

  • n_steps (int (default 2)) – Number of DTNNStep Layers to use.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True)[source]

Create a generator that iterates batches for a dataset. It processes inputs through the _compute_features_on_batch function to calculate required features of input.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

SeqToSeqModel

class SeqToSeqModel(input_tokens: List, output_tokens: List, max_output_length: int, encoder_layers: int = 4, decoder_layers: int = 4, batch_size: int = 100, embedding_dimension: int = 512, dropout: float = 0.0, reverse_input: bool = True, variational: bool = False, annealing_start_step: int = 5000, annealing_final_step: int = 10000, **kwargs)[source]

Implements sequence to sequence translation models.

The model is based on the description in Sutskever et al., “Sequence to Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215), although this implementation uses GRUs instead of LSTMs. The goal is to take sequences of tokens as input, and translate each one into a different output sequence. The input and output sequences can both be of variable length, and an output sequence need not have the same length as the input sequence it was generated from. For example, these models were originally developed for use in natural language processing. In that context, the input might be a sequence of English words, and the output might be a sequence of French words. The goal would be to train the model to translate sentences from English to French.

The model consists of two parts called the “encoder” and “decoder”. Each one consists of a stack of recurrent layers. The job of the encoder is to transform the input sequence into a single, fixed length vector called the “embedding”. That vector contains all relevant information from the input sequence. The decoder then transforms the embedding vector into the output sequence.

These models can be used for various purposes. First and most obviously, they can be used for sequence to sequence translation. In any case where you have sequences of tokens, and you want to translate each one into a different sequence, a SeqToSeq model can be trained to perform the translation.

Another possible use case is transforming variable length sequences into fixed length vectors. Many types of models require their inputs to have a fixed shape, which makes it difficult to use them with variable sized inputs (for example, when the input is a molecule, and different molecules have different numbers of atoms). In that case, you can train a SeqToSeq model as an autoencoder, so that it tries to make the output sequence identical to the input one. That forces the embedding vector to contain all information from the original sequence. You can then use the encoder for transforming sequences into fixed length embedding vectors, suitable to use as inputs to other types of models.

Another use case is to train the decoder for use as a generative model. Here again you begin by training the SeqToSeq model as an autoencoder. Once training is complete, you can supply arbitrary embedding vectors, and transform each one into an output sequence. When used in this way, you typically train it as a variational autoencoder. This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. You can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.

When training as a variational autoencoder, it is best to use KL cost annealing, as described in https://arxiv.org/abs/1511.06349. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configurable.

In this class, we establish a sequential model for the Sequence to Sequence (DTNN) [1]_.

Examples

>>> import torch
>>> from deepchem.models.torch_models.seqtoseq import SeqToSeqModel
>>> data = [
...     ("Cc1cccc(N2CCN(C(=O)C34CC5CC(CC(C5)C3)C4)CC2)c1C",
...      "Cc1cccc(N2CCN(C(=O)C34CC5CC(CC(C5)C3)C4)CC2)c1C"),
...     ("Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1",
...      "Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1"),
...     ("COc1cc2c(cc1NC(=O)CN1C(=O)NC3(CCc4ccccc43)C1=O)oc1ccccc12",
...      "COc1cc2c(cc1NC(=O)CN1C(=O)NC3(CCc4ccccc43)C1=O)oc1ccccc12"),
...     ("O=C1/C(=C/NC2CCS(=O)(=O)C2)c2ccccc2C(=O)N1c1ccccc1",
...      "O=C1/C(=C/NC2CCS(=O)(=O)C2)c2ccccc2C(=O)N1c1ccccc1"),
...     ("NC(=O)NC(Cc1ccccc1)C(=O)O",
...     "NC(=O)NC(Cc1ccccc1)C(=O)O")]
>>> train_smiles = [s[0] for s in data]
>>> tokens = set()
>>> for s in train_smiles:
...     tokens = tokens.union(set(c for c in s))
>>> tokens = sorted(list(tokens))
>>> from deepchem.models.optimizers import Adam, ExponentialDecay
>>> max_length = max(len(s) for s in train_smiles)
>>> batch_size = 100
>>> batches_per_epoch = len(train_smiles) / batch_size
>>> model = SeqToSeqModel(
...     tokens,
...     tokens,
...     max_length,
...     encoder_layers=2,
...     decoder_layers=2,
...     embedding_dimension=256,
...     model_dir="fingerprint",
...     batch_size=batch_size,
...     learning_rate=ExponentialDecay(0.001, 0.9, batches_per_epoch))
>>> for i in range(20):
...     loss = model.fit_sequences(data)
>>> prediction = model.predict_from_sequences(train_smiles, 5)

References

__init__(input_tokens: List, output_tokens: List, max_output_length: int, encoder_layers: int = 4, decoder_layers: int = 4, batch_size: int = 100, embedding_dimension: int = 512, dropout: float = 0.0, reverse_input: bool = True, variational: bool = False, annealing_start_step: int = 5000, annealing_final_step: int = 10000, **kwargs)[source]

Construct a SeqToSeq model.

Parameters:
  • input_tokens (list) – List of all tokens that may appear in input sequences.

  • output_tokens (list) – List of all tokens that may appear in output sequences

  • max_output_length (int) – Maximum length of output sequence that may be generated

  • encoder_layers (int (default 4)) – Number of recurrent layers in the encoder

  • decoder_layers (int (default 4)) – Number of recurrent layers in the decoder

  • embedding_dimension (int (default 512)) – Width of the embedding vector. This also is the width of all recurrent layers.

  • dropout (float (default 0.0)) – Dropout probability to use during training.

  • reverse_input (bool (default True)) – If True, reverse the order of input sequences before sending them into the encoder. This can improve performance when working with long sequences.

  • variational (bool (default False)) – If True, train the model as a variational autoencoder. This adds random noise to the encoder, and also constrains the embedding to follow a unit Gaussian distribution.

  • annealing_start_step (int (default 5000)) – Step (that is, batch) at which to begin turning on the constraint term for KL cost annealing.

  • annealing_final_step (int (default 10000)) – Step (that is, batch) at which to finish turning on the constraint term for KL cost annealing.

fit_sequences(sequences: List[str], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False)[source]

Train this model on a set of sequences

Parameters:
  • sequences (List[str]) – Training samples to fit to. Each sample should be represented as a tuple of the form (input_sequence, output_sequence).

  • max_checkpoints_to_keep (int) – Maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – Frequency at which to write checkpoints, measured in training steps.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

predict_from_sequences(sequences: List[str], beam_width=5)[source]

Given a set of input sequences, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • sequences (List[str]) – Input sequences to generate a prediction for

  • beam_width (int (default 5)) – Beam width to use for searching. Set to 1 to use a simple greedy search.

predict_embedding(sequences: List[str])[source]

Given a set of input sequences, compute the embedding vectors.

Parameters:

sequences (List[str]) – Input sequences to generate embeddings for.

predict_from_embedding(embeddings: List[ndarray], beam_width=5)[source]

Given a set of embedding vectors, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • embeddings (List[np.ndarray]) – Embedding vectors to generate predictions for

  • beam_width (int) – Beam width to use for searching. Set to 1 to use a simple greedy search.

GAN

class GAN(noise_input_shape: tuple, data_input_shape: list, conditional_input_shape: list, generator_fn: Callable, discriminator_fn: Callable, device: device, n_generators: int = 1, n_discriminators: int = 1, create_discriminator_loss: Callable | None = None, create_generator_loss: Callable | None = None, _call_discriminator: Callable | None = None, **kwargs)[source]

Builder class for Generative Adversarial Networks.

A Generative Adversarial Network (GAN) [gan1] is a type of generative model. It consists of two parts called the “generator” and the “discriminator”. The generator takes random noise as input and transforms it into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.

Examples

Importing necessary modules

>>> import deepchem as dc
>>> from deepchem.models.torch_models.gan import GAN
>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F

Creating a Generator

>>> class Generator(nn.Module):
...     def __init__(self, noise_input_shape, conditional_input_shape):
...         super(Generator, self).__init__()
...         self.noise_input_shape = noise_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         self.noise_dim = noise_input_shape[1:]
...         self.conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(self.noise_dim) + sum(self.conditional_dim)
...         self.output = nn.Linear(input_dim, 1)
...     def forward(self, input):
...         noise_input, conditional_input = input
...         inputs = torch.cat((noise_input, conditional_input), dim=1)
...         output = self.output(inputs)
...         return output

Creating a Discriminator

>>> class Discriminator(nn.Module):
...     def __init__(self, data_input_shape, conditional_input_shape):
...         super(Discriminator, self).__init__()
...         self.data_input_shape = data_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         # Extracting the actual data dimension
...         data_dim = data_input_shape[1:]
...         # Extracting the actual conditional dimension
...         conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(data_dim) + sum(conditional_dim)
...         # Define the dense layers
...         self.dense1 = nn.Linear(input_dim, 10)
...         self.dense2 = nn.Linear(10, 1)
...     def forward(self, input):
...         data_input, conditional_input = input
...         # Concatenate data_input and conditional_input along the second dimension
...         discrim_in = torch.cat((data_input, conditional_input), dim=1)
...         # Pass the concatenated input through the dense layers
...         x = F.relu(self.dense1(discrim_in))
...         output = torch.sigmoid(self.dense2(x))
...         return output

Defining an Example GAN class

>>> class ExampleGAN(dc.models.torch_models.GAN):
...    def get_noise_input_shape(self):
...        return (16,2,)
...    def get_data_input_shapes(self):
...        return [(16,1,)]
...    def get_conditional_input_shapes(self):
...        return [(16,1,)]
...    def create_generator(self):
...        noise_dim = self.get_noise_input_shape()
...        conditional_dim = self.get_conditional_input_shapes()[0]
...        return nn.Sequential(Generator(noise_dim, conditional_dim))
...    def create_discriminator(self):
...        data_input_shape = self.get_data_input_shapes()[0]
...        conditional_input_shape = self.get_conditional_input_shapes()[0]
...        return nn.Sequential(
...            Discriminator(data_input_shape, conditional_input_shape))

Defining the GAN

>>> batch_size = 16
>>> noise_shape = (batch_size, 2,)
>>> data_shape = [(batch_size, 1,)]
>>> conditional_shape = [(batch_size, 1,)]
>>> def create_generator(noise_dim, conditional_dim):
...     noise_dim = noise_dim
...     conditional_dim = conditional_dim[0]
...     return nn.Sequential(Generator(noise_dim, conditional_dim))
>>> def create_discriminator(data_input_shape, conditional_input_shape):
...     data_input_shape = data_input_shape[0]
...     conditional_input_shape = conditional_input_shape[0]
...     return nn.Sequential(
...         Discriminator(data_input_shape, conditional_input_shape))
>>> gan = ExampleGAN(noise_shape,
...              data_shape,
...              conditional_shape,
...              create_generator(noise_shape, conditional_shape),
...              create_discriminator(data_shape, conditional_shape),
...              device='cpu')
>>> noise = torch.rand(*gan.noise_input_shape)
>>> real_data = torch.rand(*gan.data_input_shape[0])
>>> conditional = torch.rand(*gan.conditional_input_shape[0])
>>> gen_loss, disc_loss = gan([noise, real_data, conditional])

References

[gan1]

Goodfellow, Ian, et al. “Generative adversarial nets.” Advances in neural information processing systems. 2014.

[gan2]

Arora et al., “Generalization and Equilibrium in Generative Adversarial Nets (GANs)” (https://arxiv.org/abs/1703.00573)

__init__(noise_input_shape: tuple, data_input_shape: list, conditional_input_shape: list, generator_fn: Callable, discriminator_fn: Callable, device: device, n_generators: int = 1, n_discriminators: int = 1, create_discriminator_loss: Callable | None = None, create_generator_loss: Callable | None = None, _call_discriminator: Callable | None = None, **kwargs)[source]

Construct a GAN.

In addition to the parameters listed below, this class accepts all the keyword arguments from KerasModel.

Parameters:
  • noise_input_shape (tuple) – the shape of the noise input to the generator. The first dimension (corresponding to the batch size) should be omitted.

  • data_input_shape (list of tuple) – the shapes of the inputs to the discriminator. The first dimension (corresponding to the batch size) should be omitted.

  • conditional_input_shape (list of tuple) – the shapes of the conditional inputs to the generator and discriminator. The first dimension (corresponding to the batch size) should be omitted. If there are no conditional inputs, this should be an empty list.

  • generator_fn (Callable) – a function that returns a generator. It will be called with no arguments. The returned value should be a nn.Module whose input is a list containing a batch of noise, followed by any conditional inputs. The number and shapes of its outputs must match the return value from get_data_input_shapes(), since generated data must have the same form as training data.

  • discriminator_fn (Callable) – a function that returns a discriminator. It will be called with no arguments. The returned value should be a nn.Module whose input is a list containing a batch of data, followed by any conditional inputs. Its output should be a one dimensional tensor containing the probability of each sample being a training sample.

  • device (torch.device) – the device to use for training

  • n_generators (int) – the number of generators to include

  • n_discriminators (int) – the number of discriminators to include

  • create_discriminator_loss (Callable) – a function that returns the loss function for the discriminator. It will be called with two arguments: the output from the discriminator on a batch of training data, and the output from the discriminator on a batch of generated data. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

  • create_generator_loss (Callable) – a function that returns the loss function for the generator. It will be called with one argument: the output from the discriminator on a batch of generated data. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

  • _call_discriminator (Callable) – a function that invokes the discriminator on a set of inputs. It will be called with three arguments: the discriminator to invoke, the list of data inputs, and the list of conditional inputs. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

forward(inputs) Tuple[Tensor, Tensor][source]

Compute the output of the GAN.

Parameters:

inputs (list of Tensor) – the inputs to the GAN. The first element must be a batch of noise, followed by data inputs and any conditional inputs.

Returns:

  • total_gen_loss (Tensor) – the total loss for the generator

  • total_discrim_loss (Tensor) – the total loss for the discriminator

get_noise_batch(batch_size: int) ndarray[source]

Get a batch of random noise to pass to the generator.

This should return a NumPy array whose shape matches the one returned by get_noise_input_shape(). The default implementation returns normally distributed values. Subclasses can override this to implement a different distribution.

Parameters:

batch_size (int) – the number of samples to generate

Returns:

random_noise – a batch of random noise

Return type:

ndarray

create_generator_loss(discrim_output: Tensor) Tensor[source]

Create the loss function for the generator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:

discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Returns:

output – A Tensor equal to the loss function to use for optimizing the generator.

Return type:

Tensor

create_discriminator_loss(discrim_output_train: Tensor, discrim_output_gen: Tensor) Tensor[source]

Create the loss function for the discriminator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:
  • discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.

  • discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Returns:

output – A Tensor equal to the loss function to use for optimizing the discriminator.

Return type:

Tensor

discrim_loss_fn(outputs: List, labels: List[Tensor], weights: List[Tensor]) Any[source]

Function to get the discriminator loss from the fit_generator output

Parameters:
  • outputs (list of Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.

  • labels (Tensor) – the labels for the batch. These are ignored.

  • weights (Tensor) – the weights for the batch. These are ignored.

Return type:

the value of the discriminator loss from the fit_generator output.

gen_loss_fn(outputs: List, labels: List[Tensor], weights: List[Tensor]) Tensor[source]

Function to get the Generator loss from the fit_generator output

Parameters:
  • outputs (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

  • labels (Tensor) – the labels for the batch. These are ignored.

  • weights (Tensor) – the weights for the batch. These are ignored.

Return type:

the value of the generator loss function for this input.

GANModel

class GANModel(n_generators: int = 1, n_discriminators: int = 1, create_discriminator_loss: Callable | None = None, create_generator_loss: Callable | None = None, _call_discriminator: Callable | None = None, device: device | None = None, **kwargs)[source]

Implements Generative Adversarial Networks.

A Generative Adversarial Network (GAN) is a type of generative model. It consists of two parts called the “generator” and the “discriminator”. The generator takes random noise as input and transforms it into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.

In many cases there also are additional inputs to the generator and discriminator. In that case it is known as a Conditional GAN (CGAN), since it learns a distribution that is conditional on the values of those inputs. They are referred to as “conditional inputs”.

Many variations on this idea have been proposed, and new varieties of GANs are constantly being proposed. This class tries to make it very easy to implement straightforward GANs of the most conventional types. At the same time, it tries to be flexible enough that it can be used to implement many (but certainly not all) variations on the concept.

To define a GAN, you must create a subclass that provides implementations of the following methods:

get_noise_input_shape() get_data_input_shapes() create_generator() create_discriminator()

If you want your GAN to have any conditional inputs you must also implement:

get_conditional_input_shapes()

The following methods have default implementations that are suitable for most conventional GANs. You can override them if you want to customize their behavior:

create_generator_loss() create_discriminator_loss() get_noise_batch()

This class allows a GAN to have multiple generators and discriminators, a model known as MIX+GAN. It is described in [2] This can lead to better models, and is especially useful for reducing mode collapse, since different generators can learn different parts of the distribution. To use this technique, simply specify the number of generators and discriminators when calling the constructor. You can then tell predict_gan_generator() which generator to use for predicting samples.

Examples

Importing necessary modules

>>> import deepchem as dc
>>> from deepchem.models.torch_models.gan import GAN
>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F

Creating a Generator

>>> class Generator(nn.Module):
...     def __init__(self, noise_input_shape, conditional_input_shape):
...         super(Generator, self).__init__()
...         self.noise_input_shape = noise_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         self.noise_dim = noise_input_shape[1:]
...         self.conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(self.noise_dim) + sum(self.conditional_dim)
...         self.output = nn.Linear(input_dim, 1)
...     def forward(self, input):
...         noise_input, conditional_input = input
...         inputs = torch.cat((noise_input, conditional_input), dim=1)
...         output = self.output(inputs)
...         return output

Creating a Discriminator

>>> class Discriminator(nn.Module):
...     def __init__(self, data_input_shape, conditional_input_shape):
...         super(Discriminator, self).__init__()
...         self.data_input_shape = data_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         # Extracting the actual data dimension
...         data_dim = data_input_shape[1:]
...         # Extracting the actual conditional dimension
...         conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(data_dim) + sum(conditional_dim)
...         # Define the dense layers
...         self.dense1 = nn.Linear(input_dim, 10)
...         self.dense2 = nn.Linear(10, 1)
...     def forward(self, input):
...         data_input, conditional_input = input
...         # Concatenate data_input and conditional_input along the second dimension
...         discrim_in = torch.cat((data_input, conditional_input), dim=1)
...         # Pass the concatenated input through the dense layers
...         x = F.relu(self.dense1(discrim_in))
...         output = torch.sigmoid(self.dense2(x))
...         return output

Defining an Example GAN class

>>> class ExampleGANModel(dc.models.torch_models.GANModel):
...    def get_noise_input_shape(self):
...        return (100,2,)
...    def get_data_input_shapes(self):
...        return [(100,1,)]
...    def get_conditional_input_shapes(self):
...        return [(100,1,)]
...    def create_generator(self):
...        noise_dim = self.get_noise_input_shape()
...        conditional_dim = self.get_conditional_input_shapes()[0]
...        return nn.Sequential(Generator(noise_dim, conditional_dim))
...    def create_discriminator(self):
...        data_input_shape = self.get_data_input_shapes()[0]
...        conditional_input_shape = self.get_conditional_input_shapes()[0]
...        return nn.Sequential(
...            Discriminator(data_input_shape, conditional_input_shape))

Defining a function to generate data

>>> def generate_batch(batch_size):
...     means = 10 * np.random.random([batch_size, 1])
...     values = np.random.normal(means, scale=2.0)
...     return means, values
>>> def generate_data(gan, batches, batch_size):
...     for _ in range(batches):
...         means, values = generate_batch(batch_size)
...         batch = {
...             gan.data_inputs[0]: values,
...             gan.conditional_inputs[0]: means
...         }
...         yield batch

Defining the GANModel

>>> batch_size = 100
>>> noise_shape = (batch_size, 2,)
>>> data_shape = [(batch_size, 1,)]
>>> conditional_shape = [(batch_size, 1,)]
>>> gan = ExampleGANModel(learning_rate=0.01)
>>> data = generate_data(gan, 500, 100)
>>> gan.fit_gan(data, generator_steps=0.5, checkpoint_interval=0)
>>> means = 10 * np.random.random([1000, 1])
>>> values = gan.predict_gan_generator(conditional_inputs=[means])

References

Notes

This class is a subclass of TorchModel. It accepts all the keyword arguments from TorchModel.

__init__(n_generators: int = 1, n_discriminators: int = 1, create_discriminator_loss: Callable | None = None, create_generator_loss: Callable | None = None, _call_discriminator: Callable | None = None, device: device | None = None, **kwargs)[source]
Parameters:
  • n_generators (int) – the number of generators to include

  • n_discriminators (int) – the number of discriminators to include

  • create_discriminator_loss (Callable) – a function that returns the loss function for the discriminator. It will be called with two arguments: the output from the discriminator on a batch of training data, and the output from the discriminator on a batch of generated data. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

  • create_generator_loss (Callable) – a function that returns the loss function for the generator. It will be called with one argument: the output from the discriminator on a batch of generated data. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

  • _call_discriminator (Callable) – a function that invokes the discriminator on a set of inputs. It will be called with three arguments: the discriminator to invoke, the list of data inputs, and the list of conditional inputs. The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

get_noise_input_shape()[source]

Get the shape of the generator’s noise input layer.

Subclasses must override this to return a tuple giving the shape of the noise input. The actual Input layer will be created automatically. The dimension corresponding to the batch size should be omitted.

get_data_input_shapes()[source]

Get the shapes of the inputs for training data.

Subclasses must override this to return a list of tuples, each giving the shape of one of the inputs. The actual Input layers will be created automatically. This list of shapes must also match the shapes of the generator’s outputs. The dimension corresponding to the batch size should be omitted.

get_conditional_input_shapes()[source]

Get the shapes of any conditional inputs.

Subclasses may override this to return a list of tuples, each giving the shape of one of the conditional inputs. The actual Input layers will be created automatically. The dimension corresponding to the batch size should be omitted.

The default implementation returns an empty list, meaning there are no conditional inputs.

create_generator()[source]

Create and return a generator.

Subclasses must override this to construct the generator. The returned value should be a tf.keras.Model whose inputs are a batch of noise, followed by any conditional inputs. The number and shapes of its outputs must match the return value from get_data_input_shapes(), since generated data must have the same form as training data.

create_discriminator()[source]

Create and return a discriminator.

Subclasses must override this to construct the discriminator. The returned value should be a tf.keras.Model whose inputs are all data inputs, followed by any conditional inputs. Its output should be a one dimensional tensor containing the probability of each sample being a training sample.

fit_gan(batches, generator_steps=1, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False) None[source]

Train this model on data.

Parameters:
  • batches (iterable) – batches of data to train the discriminator on, each represented as a dict that maps Inputs to values. It should specify values for all members of data_inputs and conditional_inputs.

  • generator_steps (float) – the number of training steps to perform for the generator for each batch. This can be used to adjust the ratio of training steps for the generator and discriminator. For example, 2.0 will perform two training steps for every batch, while 0.5 will only perform one training step for every two batches.

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint before training it.

predict_gan_generator(batch_size=1, noise_input=None, conditional_inputs=[], generator_index=0)[source]

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.

  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.

  • conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.

  • generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for generating the samples.

Returns:

  • An array (if the generator has only one output) or list of arrays (if it has

  • multiple outputs) containing the generated samples.

WGANModel

class WGANModel(gradient_penalty: float = 10.0, **kwargs)[source]

Implements Wasserstein Generative Adversarial Networks.

This class implements Wasserstein Generative Adversarial Networks (WGANs) as described in Arjovsky et al., “Wasserstein GAN” [wgan1]. A WGAN is conceptually rather different from a conventional GAN, but in practical terms very similar. It reinterprets the discriminator (often called the “critic” in this context) as learning an approximation to the Earth Mover distance between the training and generated distributions. The generator is then trained to minimize that distance. In practice, this just means using slightly different loss functions for training the generator and discriminator.

WGANs have theoretical advantages over conventional GANs, and they often work better in practice. In addition, the discriminator’s loss function can be directly interpreted as a measure of the quality of the model. That is an advantage over conventional GANs, where the loss does not directly convey information about the quality of the model.

The theory WGANs are based on requires the discriminator’s gradient to be bounded. The original paper achieved this by clipping its weights. This class instead does it by adding a penalty term to the discriminator’s loss, as described in [wgan2]. This is sometimes found to produce better results.

There are a few other practical differences between GANs and WGANs. In a conventional GAN, the discriminator’s output must be between 0 and 1 so it can be interpreted as a probability. In a WGAN, it should produce an unbounded output that can be interpreted as a distance.

When training a WGAN, you also should usually use a smaller value for generator_steps. Conventional GANs rely on keeping the generator and discriminator “in balance” with each other. If the discriminator ever gets too good, it becomes impossible for the generator to fool it and training stalls. WGANs do not have this problem, and in fact the better the discriminator is, the easier it is for the generator to improve. It therefore usually works best to perform several training steps on the discriminator for each training step on the generator.

Examples

Importing necessary modules

>>> import deepchem as dc
>>> from deepchem.models.torch_models.gan import WGANModel
>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F

Creating a Generator

>>> class Generator(nn.Module):
...     def __init__(self, noise_input_shape, conditional_input_shape):
...         super(Generator, self).__init__()
...         self.noise_input_shape = noise_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         self.noise_dim = noise_input_shape[1:]
...         self.conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(self.noise_dim) + sum(self.conditional_dim)
...         self.output = nn.Linear(input_dim, 1)
...     def forward(self, input):
...         noise_input, conditional_input = input
...         inputs = torch.cat((noise_input, conditional_input), dim=1)
...         output = self.output(inputs)
...         return output

Creating a Discriminator

>>> class Discriminator(nn.Module):
...     def __init__(self, data_input_shape, conditional_input_shape):
...         super(Discriminator, self).__init__()
...         self.data_input_shape = data_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         # Extracting the actual data dimension
...         data_dim = data_input_shape[1:]
...         # Extracting the actual conditional dimension
...         conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(data_dim) + sum(conditional_dim)
...         # Define the dense layers
...         self.dense1 = nn.Linear(input_dim, 10)
...         self.dense2 = nn.Linear(10, 1)
...     def forward(self, input):
...         data_input, conditional_input = input
...         # Concatenate data_input and conditional_input along the second dimension
...         discrim_in = torch.cat((data_input, conditional_input), dim=1)
...         # Pass the concatenated input through the dense layers
...         x = F.relu(self.dense1(discrim_in))
...         output = self.dense2(x)
...         return output

Creating an Example WGANModel class

>>> class ExampleWGAN(WGANModel):
...     def get_noise_input_shape(self):
...         return (100,2,)
...     def get_data_input_shapes(self):
...         return [(100,1,)]
...     def get_conditional_input_shapes(self):
...         return [(100,1,)]
...     def create_generator(self):
...         noise_dim = self.get_noise_input_shape()
...         conditional_dim = self.get_conditional_input_shapes()[0]
...         return nn.Sequential(Generator(noise_dim, conditional_dim))
...     def create_discriminator(self):
...         data_input_shape = self.get_data_input_shapes()[0]
...         conditional_input_shape = self.get_conditional_input_shapes()[0]
...         return nn.Sequential(
...             Discriminator(data_input_shape, conditional_input_shape))

Defining a function to generate data

>>> def generate_batch(batch_size):
...     means = 10 * np.random.random([batch_size, 1])
...     values = np.random.normal(means, scale=2.0)
...     return means, values
>>> def generate_data(gan, batches, batch_size):
...     for _ in range(batches):
...         means, values = generate_batch(batch_size)
...         batch = {
...             gan.data_inputs[0]: values,
...             gan.conditional_inputs[0]: means
...         }
...         yield batch

Defining the WGANModel

>>> wgan = ExampleWGAN(learning_rate=0.01,
...               gradient_penalty=0.1)
>>> data = generate_data(wgan, 500, 100)
>>> wgan.fit_gan(data, generator_steps=0.1, checkpoint_interval=0)
>>> means = 10 * np.random.random([1000, 1])
>>> values = wgan.predict_gan_generator(conditional_inputs=[means])

References

[wgan1]

Arjovsky, Martin, Soumith Chintala, and Léon Bottou. “Wasserstein generative adversarial networks.” International conference on machine learning. PMLR, 2017. (https://arxiv.org/abs/1701.07875)

[wgan2]

Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.” Advances in neural information processing systems 30 (2017). (https://arxiv.org/abs/1704.00028)

__init__(gradient_penalty: float = 10.0, **kwargs)[source]

Construct a WGAN.

In addition to the following, this class accepts all the keyword arguments from GAN and TorchModel.

Parameters:

gradient_penalty (float default 10.0) – the magnitude of the gradient penalty loss

create_generator_loss(discrim_output: Tensor) Tensor[source]

Create the loss function for the generator.

Parameters:

discrim_output (torch.Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.

Returns:

A Tensor equal to the mean of the inputs

Return type:

torch.Tensor

create_discriminator_loss(discrim_output_train: List[Tensor], discrim_output_gen: Tensor) Tensor[source]

Create the loss function for the discriminator.

Parameters:
  • discrim_output_train (List[Tensor]) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.

  • discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data.

Returns:

A Tensor equal to the loss function to use for optimizing the discriminator.

Return type:

torch.Tensor

BasicMolGANModel

class BasicMolGANModel(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, device: device | None = None, **kwargs)[source]

Model for de-novo generation of small molecules based on work of Nicola De Cao et al. [molgan1]. It uses a GAN directly on graph data and a reinforcement learning objective to induce the network to generate molecules with certain chemical properties. Utilizes WGAN infrastructure; uses adjacency matrix and node features as inputs. Inputs need to be one-hot representation.

Examples

Import necessary libraries and modules

>>> import deepchem as dc
>>> from deepchem.models.torch_models import BasicMolGANModel as MolGAN
>>> from deepchem.models.optimizers import ExponentialDecay
>>> import torch
>>> import torch.nn.functional as F

Load dataset and featurize molecules We will use a small dataset for this example. We will be using MolGanFeaturizer to featurize the molecules.

>>> smiles = ['CCC', 'C1=CC=CC=C1', 'CNC' ]
>>> # create featurizer
>>> feat = dc.feat.MolGanFeaturizer()
>>> # featurize molecules
>>> features = feat.featurize(smiles)
>>> # Remove empty objects
>>> features = list(filter(lambda x: x is not None, features))

Create and train the model

>>> # create model
>>> gan = MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000))
>>> dataset = dc.data.NumpyDataset([x.adjacency_matrix for x in features],[x.node_features for x in features])
>>> def iterbatches(epochs):
...     for i in range(epochs):
...         for batch in dataset.iterbatches(batch_size=gan.batch_size, pad_batches=True):
...             adjacency_tensor = F.one_hot(
...                     torch.Tensor(batch[0]).to(torch.int64),
...                     gan.edges).to(torch.float32)
...             node_tensor = F.one_hot(
...                     torch.Tensor(batch[1]).to(torch.int64),
...                     gan.nodes).to(torch.float32)
...             yield {gan.data_inputs[0]: adjacency_tensor, gan.data_inputs[1]:node_tensor}
>>> # train model
>>> gan.fit_gan(iterbatches(8), generator_steps=0.2, checkpoint_interval=0)

You can change the above parameters to get better results. The above example is just a simple example to show how to use the model. You can try iterbatches(1000) for better results.

Now, let’s generate some molecules using the trained model We will generate 10 molecules and then convert them to RDKit molecules.

>>> generated_data = gan.predict_gan_generator(10)
Generating 10 samples
>>> # convert graphs to RDKitmolecules
>>> nmols = feat.defeaturize(generated_data)
>>> print("{} molecules generated".format(len(nmols)))
10 molecules generated

You can increase the number of generated molecules by changing the parameter in predict_gan_generator function. Generated molecules are in the form of GraphMatrix. You can convert them to RDKit molecules using defeaturize function of MolGanFeaturizer.

Now, let’s remove invalid molecules from the generated molecules.

>>> # remove invalid moles
>>> nmols = list(filter(lambda x: x is not None, nmols))
>>> print ("{} valid molecules".format(len(nmols)))
0 valid molecules

We can see that currently training is unstable and 0 is a common outcome. You can try training the model with different parameters to get better results.

References

[molgan1]

Nicola De Cao et al. “MolGAN: An implicit generative model for small molecular graphs”, https://arxiv.org/abs/1805.11973

__init__(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, device: device | None = None, **kwargs)[source]

Initialize the model

Parameters:
  • edges (int, default 5) – Number of bond types includes BondType.Zero

  • vertices (int, default 9) – Max number of atoms in adjacency and node features matrices

  • nodes (int, default 5) – Number of atom types in node features matrix

  • embedding_dim (int, default 10) – Size of noise input array

  • dropout_rate (float, default = 0.) – Rate of dropout used across whole model

  • name (str, default '') – Name of the model

get_noise_input_shape() Tuple[int, int][source]

Return shape of the noise input used in generator

Returns:

Shape of the noise input

Return type:

Tuple

get_data_input_shapes() List[source]

Return input shape of the discriminator

Returns:

List of shapes used as an input for distriminator.

Return type:

List

create_generator()[source]

Create generator model. Take noise data as an input and processes it through number of dense and dropout layers. Then data is converted into two forms one used for training and other for generation of compounds. The model has two outputs:

  1. edges

  2. nodes

The format differs depending on intended use (training or sample generation). For sample generation use flag, sample_generation=True while calling generator i.e. gan.generators[0](noise_input, training=False, sample_generation=True). For training the model, set sample_generation=False

create_discriminator(units: List[Tuple[int, int] | int] = [(128, 64), 64])[source]

Create discriminator model based on MolGAN layers. Takes two inputs:

  1. adjacency tensor, containing bond information

  2. nodes tensor, containing atom information

The input vectors need to be in one-hot encoding format. Use MolGAN featurizer for that purpose. It will be simplified in the future release.

predict_gan_generator(batch_size: int = 1, noise_input: List | Tensor | None = None, conditional_inputs: List = [], generator_index: int = 0) List[GraphMatrix][source]

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.

  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.

  • conditional_inputs (list of arrays) – NOT USED. the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.

  • generator_index (int) – NOT USED. the index of the generator (between 0 and n_generators-1) to use for generating the samples.

Returns:

Returns a list of GraphMatrix object that can be converted into RDKit molecules using MolGANFeaturizer defeaturize function.

Return type:

List[GraphMatrix]

Weave

class Weave(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn=<function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100)[source]

A graph convolutional network(GCN) for either classification or regression. The network consists of the following sequence of layers:

  • Weave feature modules

  • Final convolution

  • Weave Gather Layer

  • A fully connected layer

  • A Softmax layer

Example

>>> import numpy as np
>>> import deepchem as dc
>>> featurizer = dc.feat.WeaveFeaturizer()
>>> X = featurizer(["C", "CC"])
>>> y = np.array([1, 0])
>>> batch_size = 2
>>> weavemodel = dc.models.torch_models.WeaveModel(n_tasks=1,n_weave=2, fully_connected_layer_sizes=[2000, 1000],mode="classification",batch_size=batch_size)
>>> atom_feat, pair_feat, pair_split, atom_split, atom_to_pair = weavemodel.compute_features_on_batch(X)
>>> model = Weave(n_tasks=1,n_weave=2,fully_connected_layer_sizes=[2000, 1000],mode="classification")
>>> input_data = [atom_feat, pair_feat, pair_split, atom_split, atom_to_pair]
>>> output = model(input_data)

References

__init__(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn=<function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78 if chirality is used by WeaveFeaturizer.

  • n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.

  • n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer

  • n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)

  • n_weave (int, optional (default 2)) – The number of weave layers in this model.

  • fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of this list determines the number of layers.

  • conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight initialization of each convolutional layer. The length of this lisst should equal n_weave. Alternatively, this may be a single value instead of a list, in which case the same value is used for each layer.

  • weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight initialization of each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • final_conv_activation_fn (Optional[ActivationFn] (default F.tanh)) – The activation funcntion to apply to the final convolution at the end of the weave convolutions. If None, then no activate is applied (hence linear).

  • activation_fns (str (default relu)) – The activation function to apply to each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional and fully connected layers.

  • gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram

  • compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input.

  • mode (str (default "classification")) – Either “classification” or “regression” for type of model.

  • n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)

  • batch_size (int (default 100)) – Batch size used by this model for training.

forward(inputs: Tensor | Sequence[Tensor]) List[Tensor][source]
Parameters:

inputs (OneOrMany[torch.Tensor]) – Should contain 5 tensors [atom_features, pair_features, pair_split, atom_split, atom_to_pair]

Returns:

Output as per use case : regression/classification

Return type:

List[torch.Tensor]

WeaveModel

class WeaveModel(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn: ~typing.Callable | str | None = <function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]

Implements Google-style Weave Graph Convolutions

This model implements the Weave style graph convolutions from [1]_.

The biggest difference between WeaveModel style convolutions and GraphConvModel style convolutions is that Weave convolutions model bond features explicitly. This has the side effect that it needs to construct a NxN matrix explicitly to model bond interactions. This may cause scaling issues, but may possibly allow for better modeling of subtle bond effects.

Note that [1]_ introduces a whole variety of different architectures for Weave models. The default settings in this class correspond to the W2N2 variant from [1]_ which is the most commonly used variant..

Examples

Here’s an example of how to fit a WeaveModel on a tiny sample dataset.

>>> import numpy as np
>>> import deepchem as dc
>>> featurizer = dc.feat.WeaveFeaturizer()
>>> X = featurizer(["C", "CC"])
>>> y = np.array([1, 0])
>>> dataset = dc.data.NumpyDataset(X, y)
>>> model = dc.models.torch_models.WeaveModel(n_tasks=1, n_weave=2, fully_connected_layer_sizes=[2000, 1000], mode="classification")
>>> loss = model.fit(dataset)

References

__init__(n_tasks: int, n_atom_feat: int | ~typing.Sequence[int] = 75, n_pair_feat: int | ~typing.Sequence[int] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: ~typing.List[int] = [2000, 100], conv_weight_init_stddevs: float | ~typing.Sequence[float] = 0.03, weight_init_stddevs: float | ~typing.Sequence[float] = 0.01, bias_init_consts: float | ~typing.Sequence[float] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | ~typing.Sequence[float] = 0.25, final_conv_activation_fn: ~typing.Callable | str | None = <function tanh>, activation_fns: ~typing.Callable | str | ~typing.Sequence[~typing.Callable | str] = 'relu', batch_normalize: bool = True, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78 if chirality is used by WeaveFeaturizer.

  • n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.

  • n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer

  • n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)

  • n_weave (int, optional (default 2)) – The number of weave layers in this model.

  • fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of this list determines the number of layers.

  • conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight initialization of each convolutional layer. The length of this lisst should equal n_weave. Alternatively, this may be a single value instead of a list, in which case the same value is used for each layer.

  • weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight initialization of each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float (default 0.0)) – The magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str (default "l2")) – The type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • final_conv_activation_fn (Optional[ActivationFn] (default F.tanh)) – The activation funcntion to apply to the final convolution at the end of the weave convolutions. If None, then no activate is applied (hence linear).

  • activation_fns (str (default relu)) – The activation function to apply to each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional and fully connected layers.

  • gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram

  • compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input.

  • mode (str (default "classification")) – Either “classification” or “regression” for type of model.

  • n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)

  • batch_size (int (default 100)) – Batch size used by this model for training.

compute_features_on_batch(X_b)[source]

Compute tensors that will be input into the model from featurized representation.

The featurized input to WeaveModel is instances of WeaveMol created by WeaveFeaturizer. This method converts input WeaveMol objects into tensors used by the Keras implementation to compute WeaveModel outputs.

Parameters:

X_b (np.ndarray) – A numpy array with dtype=object where elements are WeaveMol objects.

Returns:

  • atom_feat (np.ndarray) – Of shape (N_atoms, N_atom_feat).

  • pair_feat (np.ndarray) – Of shape (N_pairs, N_pair_feat). Note that N_pairs will depend on the number of pairs being considered. If max_pair_distance is None, then this will be N_atoms**2. Else it will be the number of pairs within the specifed graph distance.

  • pair_split (np.ndarray) – Of shape (N_pairs,). The i-th entry in this array will tell you the originating atom for this pair (the “source”). Note that pairs are symmetric so for a pair (a, b), both a and b will separately be sources at different points in this array.

  • atom_split (np.ndarray) – Of shape (N_atoms,). The i-th entry in this array will be the molecule with the i-th atom belongs to.

  • atom_to_pair (np.ndarray) – Of shape (N_pairs, 2). The i-th row in this array will be the array [a, b] if (a, b) is a pair to be considered. (Note by symmetry, this implies some other row will contain [b, a].

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Convert a dataset into the tensors needed for learning.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to convert

  • epochs (int, optional (Default 1)) – Number of times to walk over dataset

  • mode (str, optional (Default 'fit')) – Ignored in this implementation.

  • deterministic (bool, optional (Default True)) – Whether the dataset should be walked in a deterministic fashion

  • pad_batches (bool, optional (Default True)) – If true, each returned batch will have size self.batch_size.

Return type:

Iterator which walks over the batches

ProgressiveMultitaskClassifier

class ProgressiveMultitaskClassifier(n_tasks, n_features, layer_sizes=[1000], alpha_init_stddevs=0.02, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', activation_fns='relu', dropouts=0.5, n_outputs=2, n_classes=2, **kwargs)[source]

Implements a progressive multitask neural network for classification.

Progressive Networks: https://arxiv.org/pdf/1606.04671v3.pdf

Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.

References

See [1]_ for a full description of the progressive architecture

__init__(n_tasks, n_features, layer_sizes=[1000], alpha_init_stddevs=0.02, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', activation_fns='relu', dropouts=0.5, n_outputs=2, n_classes=2, **kwargs)[source]

Creates a progressive network.

Only listing parameters specific to progressive networks here.

Parameters:
  • n_tasks (int) – Number of tasks

  • n_features (int) – Number of input features

  • alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

ProgressiveMultitaskRegressor

class ProgressiveMultitaskRegressor(n_tasks, n_features, layer_sizes=[1000], alpha_init_stddevs=0.02, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', activation_fns='relu', dropouts=0.5, n_outputs=1, n_classes=1, **kwargs)[source]

Implements a progressive multitask neural network for regression.

Progressive Networks: https://arxiv.org/pdf/1606.04671v3.pdf

Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.

References

See [1]_ for a full description of the progressive architecture

__init__(n_tasks, n_features, layer_sizes=[1000], alpha_init_stddevs=0.02, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', activation_fns='relu', dropouts=0.5, n_outputs=1, n_classes=1, **kwargs)[source]

Creates a progressive network.

Only listing parameters specific to progressive networks here.

Parameters:
  • n_tasks (int) – Number of tasks

  • n_features (int) – Number of input features

  • alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

RobustMultitaskClassifier

class RobustMultitaskClassifier(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: Literal['l1', 'l2'] = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ReLU(), n_classes: int = 2, bypass_layer_sizes: Sequence[int] = [100], bypass_weight_init_stddevs: float | Sequence[float] = [0.02], bypass_bias_init_consts: float | Sequence[float] = [1.0], bypass_dropouts: float | Sequence[float] = [0.5], **kwargs)[source]

Implements a neural network for robust multitasking. The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility toroute around challenges in multitasking with destructive interference. .. rubric:: References

This technique was introduced in [1]_ .. [1] Ramsundar, Bharath, et al. “Is multitask deep learning practical for pharma?.” Journal of chemical information and modeling 57.8 (2017): 2068-2076.

__init__(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: Literal['l1', 'l2'] = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ReLU(), n_classes: int = 2, bypass_layer_sizes: Sequence[int] = [100], bypass_weight_init_stddevs: float | Sequence[float] = [0.02], bypass_bias_init_consts: float | Sequence[float] = [1.0], bypass_dropouts: float | Sequence[float] = [0.5], **kwargs)[source]
Parameters:
  • n_tasks (int) – number of tasks

  • n_features (int) – number of features

  • layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.

  • weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • weight_decay_penalty (float) – the magnitude of the weight decay penalty to use

  • weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’

  • dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • n_classes (int) – the number of classes

  • bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.

  • bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers. same requirements as weight_init_stddevs

  • bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers same requirements as bias_init_consts

  • bypass_dropouts (list or float) – the dropout probablity to use for bypass layers. same requirements as dropouts

default_generator(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]

Create a generator that iterates batches for a dataset.

Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

RobustMultitaskRegressor

class RobustMultitaskRegressor(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ReLU(), bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Implements a neural network for robust multitasking.

The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility toroute around challenges in multitasking with destructive interference.

References

This technique was introduced in [1]_

__init__(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ReLU(), bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

Density Functional Theory Model - XCModel

class XCModel(xcstr: str, nnmodel: Module | None = None, input_size: int = 2, hidden_size: int = 10, n_layers: int = 1, modeltype: int = 1, n_tasks: int = 0, log_frequency: int = 0, mode: str = 'classification', device: device | None = None, aweight0: float = 0.0, **kwargs)[source]

This class is used to initialize and run Differentiable Quantum Chemistry (i.e, DFT) calculations, using an exchange correlation functional that has been replaced by a neural network. This model is based on the paper “Learning the exchange-correlation functional from nature with fully differentiable density functional theory.” and is listed below for reference.

To read more about Density Functional Theory and the exchange correlation functional please check the references below.

Examples

>>> from deepchem.models.dft.dftxc import XCModel
>>> from deepchem.data.data_loader import DFTYamlLoader
>>> inputs = 'deepchem/models/tests/assets/test_dftxcdata.yaml'
>>> data = DFTYamlLoader()
>>> dataset = data.create_dataset(inputs)
>>> dataset.get_shape()
((2,), (2,), (2,), (2,))
>>> model = XCModel("lda_x", batch_size=1)
>>> loss = model.fit(dataset, nb_epoch=1, checkpoint_interval=1)
The 6-311++G(3df,3pd) basis for atomz 3 does not exist, but we will download it
Downloaded to /home/runner/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/03.gaussian94
The 6-311++G(3df,3pd) basis for atomz 1 does not exist, but we will download it
Downloaded to /home/runner/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/01.gaussian94

Notes

There are 4 types of DFT data object implementations that are used to determine the type of calculation to be carried out on the entry object. These types are: “ae”, “ie”, “dm”, “dens”, that stand for atomization energy, ionization energy, density matrix and density profile respectively. The entry type “Density Matrix” cannot be used on model.evaluate as of now. To run predictions on this data type, a dataset containing only “dm” entries must be used.

References

https://github.com/deepchem/deepchem/blob/3f06168a6c9c16fd90cde7f5246b94f484ea3890/deepchem/models/dft/nnxc.py Encyclopedia of Condensed Matter Physics, 2005. Kasim, Muhammad F., and Sam M. Vinko. “Learning the exchange-correlation functional from nature with fully differentiable density functional theory.” Physical Review Letters 127.12 (2021): 126403.

__init__(xcstr: str, nnmodel: Module | None = None, input_size: int = 2, hidden_size: int = 10, n_layers: int = 1, modeltype: int = 1, n_tasks: int = 0, log_frequency: int = 0, mode: str = 'classification', device: device | None = None, aweight0: float = 0.0, **kwargs) None[source]
Parameters:
  • xcstr (str) – The choice of xc to use.

  • nnmodel (torch.nn.Module) – the PyTorch model implementing the calculation

  • input_size (int) – size of neural network input

  • hidden_size (int) – size of the hidden layers ; the number of hidden layers is fixed in the default method.

  • n_layers (int) – number of layers in the neural network

  • modeltype (int) – model type 2 includes an activation layer whereas type 1 does not.

  • aweight0 (float (default 0.0)) – Weightage of the Neural Network Model on the final result.

TextCNNModel

class TextCNNModel(n_tasks: int, char_dict: Dict[str, int], seq_length: int, n_embedding: int = 75, kernel_sizes: List[int] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters: List[int] = [100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout: float = 0.25, mode: str = 'classification', **kwargs)[source]

A 1D convolutional neural network to work on smiles strings for both classification and regression tasks.

Reimplementation of the discriminator module in ORGAN [1] . Originated from [2].

The model converts the input smile strings to an embedding vector, the vector is convolved and pooled through a series of convolutional filters which are concatnated and later passed through a simple dense layer. The resulting vector goes through a Highway layer [3] which finally as per the nature of the task is passed through a dense layer.

References

Examples

>>> import os
>>> from deepchem.models.torch_models import TextCNNModel
>>> from deepchem.models.torch_models.text_cnn import default_dict
>>> n_tasks = 1
>>> seq_len = 250
>>> model = TextCNNModel(n_tasks, default_dict, seq_len)
__init__(n_tasks: int, char_dict: Dict[str, int], seq_length: int, n_embedding: int = 75, kernel_sizes: List[int] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters: List[int] = [100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout: float = 0.25, mode: str = 'classification', **kwargs) None[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • char_dict (dict) – Mapping from characters in smiles to integers

  • seq_length (int) – Length of sequences(after padding)

  • n_embedding (int, optional) – Length of embedding vector

  • filter_sizes (list of int, optional) – Properties of filters used in the conv net

  • num_filters (list of int, optional) – Properties of filters used in the conv net

  • dropout (float, optional) – Dropout rate

  • mode (str) – Either “classification” or “regression” for type of model.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Transfer smiles strings to fixed length integer vectors

Parameters:
  • dataset (dc.data.Dataset) – Dataset to convert

  • epochs (int, optional (Default 1)) – Number of times to walk over dataset

  • mode (str, optional (Default 'fit')) – Ignored in this implementation.

  • deterministic (bool, optional (Default True)) – Whether the dataset should be walked in a deterministic fashion

  • pad_batches (bool, optional (Default True)) – If true, each returned batch will have size self.batch_size.

Return type:

Iterator which walks over the batches

static build_char_dict(dataset: Dataset, default_dict: Dict[str, int] = {'#': 1, '(': 2, ')': 3, '+': 4, '-': 5, '/': 6, '1': 7, '2': 8, '3': 9, '4': 10, '5': 11, '6': 12, '7': 13, '8': 14, '=': 15, 'Br': 30, 'C': 16, 'Cl': 29, 'F': 17, 'H': 18, 'I': 19, 'N': 20, 'O': 21, 'P': 22, 'S': 23, '[': 24, '\\': 25, ']': 26, '_': 27, 'c': 28, 'n': 31, 'o': 32, 's': 33})[source]

Collect all unique characters(in smiles) from the dataset. This method should be called before defining the model to build appropriate char_dict

Parameters:
  • dataset (Dataset) – Dataset for which char_dict is built for

  • default_dict (dict, optional) – Mapping from characters in smiles to integers, optional

Returns:

  • out_dict (dict) – A dictionary containing mapping between unique characters in the dataset to integers

  • seq_length (int) – The maximum sequence length of smile strings found in the dataset multiplied by 1.2

smiles_to_seq(smiles: str)[source]

Tokenize characters in smiles to integers

Parameters:

smiles (str) – A smile string

Returns:

array – An array of integers representing the tokenized sequence of characters.

Return type:

np.ndarray

static convert_bytes_to_char(s: bytes) str[source]

Convert bytes to string.

Parameters:

s (bytes) – Bytes to be converted to string.

Returns:

String representation of the bytes.

Return type:

str

smiles_to_seq_batch(ids_b: List[bytes | str] | ndarray) ndarray[source]

Converts SMILES strings to np.array sequence.

Parameters:

ids_b (Union[List[Union[bytes, str]], np.ndarray]) – A list of SMILES strings, either as bytes or strings.

Returns:

A numpy array containing the tokenized sequences of SMILES strings.

Return type:

np.ndarray

PINNModel

class PINNModel(model: ~torch.nn.modules.module.Module | None = None, in_features: int | None = None, loss_fn: ~typing.Callable | None = None, pde_fn: ~typing.List | ~typing.Callable = [<function PINNModel.<lambda>>], pde_weights: ~typing.List | float = [1.0], boundary_data: ~typing.Dict = {}, data_weight: float = 1.0, physics_weight: float = 1.0, eval_fn: ~typing.Callable | None = None, mode: str = 'regression', **kwargs)[source]

This model is designed for solving linear partial differential equations (PDEs) using Physics-Informed Neural Networks (PINNs). It extends the TorchModel class, and its methods are similar to those in TorchModel, with additional functionality for handling physics-based constraints.

Parameters:
  • pde_fn (callable) –

    A function that defines the physics PDE residuals. Each PINN may have a unique strategy for calculating the physics losses, and this function specifies how the PINNModel computes the PDE residuals. The function should follow this format:

    >>> def heat_equation_residual(model, x):
    ...     x.requires_grad_(True)
    ...     u = model(x)
    ...     du_dx = torch.autograd.grad(u.sum(), x, create_graph=True, retain_graph=True)[0]
    ...     d2u_dx2 = torch.autograd.grad(du_dx.sum(), x, create_graph=True, retain_graph=True)[0]
    ...     return d2u_dx2
    

    Here, model is the neural network being trained, and x is the input.

  • boundary_data (dict) –

    A dictionary containing the boundary condition data. The PINNModel supports three boundary data types: Dirichlet, Neumann and Robin. The following format must be followed:

    >>> boundary_data = {
    ...     'dirichlet': {
    ...         'points': torch.tensor([[0.0], [1.0]], dtype=torch.float32),
    ...         'values': torch.tensor([[0.0], [1.0]], dtype=torch.float32)
    ...     }
    ... }
    
    • points: Tensor of input points where boundary conditions are defined.

    • values: Tensor of target values at the boundary points.

  • loss_fn (callable) –

    A custom loss function that combines data, physics, and boundary losses. An example is shown below:

    >>> def custom_loss(outputs, labels, weights=None):
    ...     outputs = outputs[0]
    ...     labels = labels[0]
    ...     data_loss = torch.mean(torch.square(outputs - labels))
    ...     pde_residuals = heat_equation_residual(model, labels)
    ...     pde_loss = torch.mean(torch.abs(pde_residuals))
    ...     boundary_loss = 0.0
    ...     for _, value in boundary_data.items():
    ...         if isinstance(value, dict):
    ...             points = value.get('points')
    ...             values = value.get('values')
    ...             if points is not None and values is not None:
    ...                 pred = model(points)
    ...                 boundary_loss += torch.mean(torch.square(pred - values))
    ...     return data_loss + pde_loss + 10 * boundary_loss
    

  • Example (Usage) –

  • -------------

  • equation (Here's an example of using PINNModel to solve the 1D steady-state heat) –

  • torch (>>> import) –

  • dc (>>> import deepchem as) –

  • HeatNet(torch.nn.Module) (>>> class) –

  • __init__(self) (... def) –

  • super(HeatNet (...) –

  • self).__init__()

  • torch.nn.Sequential( (... self.net =) –

  • torch.nn.Linear(1 (...) –

  • 64)

:param : :param … torch.nn.Tanh(): :param : :param … torch.nn.Linear(64: :param 64): :param : :param … torch.nn.Tanh(): :param : :param … torch.nn.Linear(64: :param 1): :param … ): :param … def forward(self: :param x): :param … if not isinstance(x: :param torch.Tensor): :param … x = torch.tensor(x: :param dtype=torch.float32): :param … return self.net(x): :param >>> def heat_equation_residual(u: :param x): :param … x.requires_grad_(True): :param … du_dx = torch.autograd.grad(u.sum(): :param x: :param create_graph=True: :param retain_graph=True)[0]: :param … d2u_dx2 = torch.autograd.grad(du_dx.sum(): :param x: :param create_graph=True: :param retain_graph=True)[0]: :param … return du_dx - d2u_dx2 # Let alpha be 1.0: :param >>> x_interior = torch.linspace(0: :type >>> x_interior = torch.linspace(0: -1].reshape(-1, 1) :param 1: :type 1: -1].reshape(-1, 1) :param 2000)[1: :type 2000)[1: -1].reshape(-1, 1) :param >>> x_boundary = torch.tensor([[0.0]: :param [1.0]]): :param >>> x = torch.cat([x_interior: :param x_boundary]: :param dim=0): :param >>> y = x.clone(): :param >>> dataset = dc.data.NumpyDataset(X=x.numpy(): :param y=y.numpy()): :param >>> boundary_data = {: :param … ‘dirichlet’: :type … ‘dirichlet’: { :param … ‘points’: :type … ‘points’: torch.tensor([[0.0], [1.0]], dtype=torch.float32), :param … ‘values’: :type … ‘values’: torch.tensor([[0.0], [1.0]], dtype=torch.float32) :param … }: :param … }: :param >>> model = HeatNet(): :param >>> pinn = PINNModel(: :param … model=model: :param : :param … pde_fn=heat_equation_residual: :param : :param … boundary_data=boundary_data: :param : :param … ): :param >>> loss = pinn.fit(dataset: :param nb_epoch=100):

References

[1] Raissi et al. “Physics-informed neural networks: A deep learning framework for solving

forward and inverse problems involving nonlinear partial differential equations.” Journal of Computational Physics, https://doi.org/10.1016/j.jcp.2018.10.045

[2] Raissi et al. “Physics-Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear

Partial Differential Equations.” arXiv preprint arXiv:1711.10561

Notes

  • This class requires PyTorch to be installed.

  • Users can use the default neural network provided by the class or pass a custom model.

__init__(model: ~torch.nn.modules.module.Module | None = None, in_features: int | None = None, loss_fn: ~typing.Callable | None = None, pde_fn: ~typing.List | ~typing.Callable = [<function PINNModel.<lambda>>], pde_weights: ~typing.List | float = [1.0], boundary_data: ~typing.Dict = {}, data_weight: float = 1.0, physics_weight: float = 1.0, eval_fn: ~typing.Callable | None = None, mode: str = 'regression', **kwargs) None[source]

Initialize PINNModel.

Parameters:
  • model (nn.Module, optional) – PyTorch neural network model for training. If not provided, a default neural network is used for regression.

  • in_features (int, optional) – Number of input features for the default model. Ignored if a custom model is provided.

  • loss_fn (Callable) – Loss function for the data-driven part of the training.

  • pde_fn (Callable or List[Callable]) – Function(s) that compute the PDE residuals. Should take model predictions and input coordinates as arguments and return the PDE residuals.

  • pde_weights (float or List[float]) – Weights for each PDE when there are multiple. If a single value is provided, it is applied to all PDE terms.

  • boundary_data (Dict) – Dictionary containing boundary condition data.

  • data_weight (float, optional, default=1.0) – Weight for the data-driven loss term in the total loss computation.

  • physics_weight (float, optional, default=1.0) – Weight for the physics-informed loss term in the total loss computation.

  • eval_fn (Callable, optional) – Custom function for model evaluation during inference. If not provided, a default evaluation function is used.

  • **kwargs – Additional arguments passed to the parent TorchModel class.

predict(dataset)[source]

Makes predictions on dataset using the evaluation function.

Parameters:

dataset (Dataset) – Dataset to make predictions on

Return type:

Predicted values

UNetModel

class UNetModel(in_channels: int = 3, out_channels: int = 1, **kwargs)[source]

UNet model for image segmentation.

UNet is a convolutional neural network architecture for fast and precise segmentation of images based on the works of Ronneberger et al. [1]. The architecture consists of an encoder, a bottleneck, and a decoder. The encoder downsamples the input image to capture the context of the image. The bottleneck captures the most important features of the image. The decoder upsamples the image to generate the segmentation mask. The encoder and decoder are connected by skip connections to preserve spatial information.

Examples

Importing necessary modules

>>> import numpy as np
>>> import deepchem as dc
>>> from deepchem.models.torch_models import UNet

Creating a random dataset of 5 32x32 pixel RGB input images and 5 32x32 pixel grey scale output images

>>> x = np.random.randn(5, 3, 32, 32).astype(np.float32)
>>> y = np.random.rand(5, 1, 32, 32).astype(np.float32)
>>> dataset = dc.data.NumpyDataset(x, y)

We will create a UNet model with 3 input channels and 1 output channel. We will then fit the model on the dataset for 5 epochs and predict the output images.

>>> model = UNetModel(in_channels=3, out_channels=1)
>>> loss = model.fit(dataset, nb_epoch=5)
>>> predictions = model.predict(dataset)

Notes

1. This implementation of the UNet model makes some changes to the padding of the inputs to the convolutional layers. The padding is set to ‘same’ to ensure that the output size of the convolutional layers is the same as the input size. This is done to preserve the spatial information of the input image and to keep the output size of the encoder and decoder the same.

  1. The input image size must be divisible by 2^4 = 16 to ensure that the output size of the encoder and decoder is the same.

References

__init__(in_channels: int = 3, out_channels: int = 1, **kwargs)[source]
Parameters:
  • input_channels (int (default 3)) – Number of input channels.

  • output_channels (int (default 1)) – Number of output channels.

_GraphConvTorchModel

class _GraphConvTorchModel(n_tasks: int, number_input_features: List[int], graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout=0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_normalize: bool = True, uncertainty: bool = False, batch_size: int = 100)[source]

Graph Convolutional Models.

This class implements the graph convolutional model from the following paper [1]_. These graph convolutions start with a per-atom set of descriptors for each atom in a molecule, then combine and recombine these descriptors over convolutional layers. following [1]_.

All arguments have the same meaning as in GraphConvModel.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> from deepchem.models.torch_models import _GraphConvTorchModel
>>> batch_size = 10
>>> out_channels = 2
>>> raw_smiles = ['CCC', 'C']
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in raw_smiles]
>>> featurizer = dc.feat.graph_features.ConvMolFeaturizer()
>>> mols = featurizer.featurize(mols)
>>> multi_mol = dc.feat.mol_graphs.ConvMol.agglomerate_mols(mols)
>>> atom_features = torch.from_numpy(multi_mol.get_atom_features().astype(np.float32))
>>> degree_slice = torch.from_numpy(multi_mol.deg_slice)
>>> membership = torch.from_numpy(multi_mol.membership)
>>> deg_adjs = [torch.from_numpy(i) for i in multi_mol.get_deg_adjacency_lists()[1:]]
>>> args = [atom_features, degree_slice, membership, torch.tensor(2)] + deg_adjs
>>> model = _GraphConvTorchModel(out_channels, graph_conv_layers=[64, 64], number_input_features=[atom_features.shape[-1], 64],dense_layer_size=128,dropout=0.0,mode="classification",number_atom_features=75,n_classes=2,batch_normalize=False,uncertainty=False,batch_size=batch_size)
>>> result = model(args)
>>> len(result)
3

References

__init__(n_tasks: int, number_input_features: List[int], graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout=0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_normalize: bool = True, uncertainty: bool = False, batch_size: int = 100)[source]
Parameters:
  • n_tasks (int) – Number of tasks

  • number_input_features (int) – Number of input features to each of the Graph Conv Layer

  • graph_conv_layers (list of int) – Width of channels for the Graph Convolution Layers

  • dense_layer_size (int) – Width of channels for Atom Level Dense Layer after GraphPool

  • dropout (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(graph_conv_layers)+1 (one value for each convolution layer, and one for the dense layer). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • mode (str) – Either “classification” or “regression”

  • number_atom_features (int) – 75 is the default number of atom features created, but this can vary if various options are passed to the function atom_features in graph_features

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • batch_normalize (True) – if True, apply batch normalization to model

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

forward(inputs: Tensor | Sequence[Tensor], training=False) List[Tensor][source]
Parameters:
  • inputs (OneOrMany[torch.Tensor]) –

  • [atom_features (Should contain tensors) –

  • degree_slice

  • membership

  • deg_adjs (n_samples] and) –

Returns:

Output as per use case : regression/classification

Return type:

List[torch.Tensor]

GraphConvModel

class GraphConvModel(n_tasks: int, number_input_features: List[int], graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]

Graph Convolutional Models.

This class implements the graph convolutional model from the following paper [1]_. These graph convolutions start with a per-atom set of descriptors for each atom in a molecule, then combine and recombine these descriptors over convolutional layers. following [1]_.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> from deepchem.models.torch_models import GraphConvModel
>>> featurizer = dc.feat.ConvMolFeaturizer()
>>> tasks = ["outcome"]
>>> mols = ["C", "CO", "CC"]
>>> X = featurizer(mols)
>>> y = np.array([0, 1, 0])
>>> dataset = dc.data.NumpyDataset(X, y)
>>> classification_metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean, mode="classification")
>>> batch_size = 10
>>> model = GraphConvModel(len(tasks), number_input_features=[75, 64], batch_size=batch_size, batch_normalize=False, mode='classification')
>>> loss = model.fit(dataset, nb_epoch=20)

References

__init__(n_tasks: int, number_input_features: List[int], graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]

The wrapper class for graph convolutions.

Note that since the underlying _GraphConvKerasModel class is specified using imperative subclassing style, this model cannout make predictions for arbitrary outputs.

Parameters:
  • n_tasks (int) – Number of tasks

  • number_input_features (list of int) – Number of input features to each of the Graph Conv Layer

  • graph_conv_layers (list of int) – Width of channels for the Graph Convolution Layers

  • dense_layer_size (int) – Width of channels for Atom Level Dense Layer after GraphPool

  • dropout (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(graph_conv_layers)+1 (one value for each convolution layer, and one for the dense layer). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

  • mode (str) – Either “classification” or “regression”

  • number_atom_features (int) – 75 is the default number of atom features created, but this can vary if various options are passed to the function atom_features in graph_features

  • n_classes (int) – the number of classes to predict (only used in classification mode)

  • batch_normalize (True) – if True, apply batch normalization to model

  • uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True)[source]

Convert a dataset into the tensors needed for learning. :param dataset: Dataset to convert :type dataset: dc.data.Dataset :param epochs: Number of times to walk over dataset :type epochs: int, optional (Default 1) :param mode: Ignored in this implementation. :type mode: str, optional (Default ‘fit’) :param deterministic: Whether the dataset should be walked in a deterministic fashion :type deterministic: bool, optional (Default True) :param pad_batches: If true, each returned batch will have size self.batch_size. :type pad_batches: bool, optional (Default True)

Return type:

Iterator which walks over the batches

Smiles2Vec

class Smiles2Vec(char_to_idx: Dict, n_tasks: int = 10, max_seq_len: int = 270, embedding_dim: int = 50, n_classes: int = 2, use_bidir: bool = True, use_conv: bool = True, filters: int = 192, kernel_size: int = 3, strides: int = 1, rnn_sizes: List[int] = [224, 384], rnn_types: List[str] = ['GRU', 'GRU'], mode: str = 'regression')[source]

Implements the Smiles2Vec model, that learns neural representations of SMILES strings which can be used for downstream tasks.

The goal here is to take SMILES strings as inputs, turn them into vector representations which can then be used in predicting molecular properties.

The model consists of an Embedding layer that retrieves embeddings for each character in the SMILES string. These embeddings are learnt jointly with the rest of the model. The output from the embedding layer is a tensor of shape (batch_size, seq_len, embedding_dim). This tensor can optionally be fed through a 1D convolutional layer, before being passed to a series of RNN cells (optionally bidirectional). The final output from the RNN cells aims to have learnt the temporal dependencies in the SMILES string, and in turn information about the structure of the molecule, which is then used for molecular property prediction.

In the paper, the authors also train an explanation mask to endow the model with interpretability and gain insights into its decision making. This segment is currently not a part of this implementation as this was developed for the purpose of investigating a transfer learning protocol, ChemNet.

References

Predicting Chemical Properties” (https://arxiv.org/pdf/1712.02034.pdf) .. [2] Chemnet (https://arxiv.org/abs/1712.02734)

__init__(char_to_idx: Dict, n_tasks: int = 10, max_seq_len: int = 270, embedding_dim: int = 50, n_classes: int = 2, use_bidir: bool = True, use_conv: bool = True, filters: int = 192, kernel_size: int = 3, strides: int = 1, rnn_sizes: List[int] = [224, 384], rnn_types: List[str] = ['GRU', 'GRU'], mode: str = 'regression')[source]
Parameters:
  • char_to_idx (dict,) – char_to_idx contains character to index mapping for SMILES characters

  • embedding_dim (int, default 50) – Size of character embeddings used.

  • use_bidir (bool, default True) – Whether to use BiDirectional RNN Cells

  • use_conv (bool, default True) – Whether to use a conv-layer

  • kernel_size (int, default 3) – Kernel size for convolutions

  • filters (int, default 192) – Number of filters

  • strides (int, default 1) – Strides used in convolution

  • rnn_sizes (list[int], default [224, 384]) – Number of hidden units in the RNN cells

  • mode (str, default regression) – Whether to use model for regression or classification

forward(smiles_seqs: List)[source]

Build the model.

Smiles2VecModel

class Smiles2VecModel(char_to_idx: Dict, n_tasks: int = 10, max_seq_len: int = 270, embedding_dim: int = 50, n_classes: int = 2, use_bidir: bool = True, use_conv: bool = True, filters: int = 192, kernel_size: int = 3, strides: int = 1, rnn_sizes: List[int] = [224, 384], rnn_types: List[str] = ['GRU', 'GRU'], mode: str = 'regression', device: device | None = None, **kwargs)[source]

Implements the Smiles2Vec model, that learns neural representations of SMILES strings which can be used for downstream tasks.

The goal here is to take SMILES strings as inputs, turn them into vector representations which can then be used in predicting molecular properties.

The model consists of an Embedding layer that retrieves embeddings for each character in the SMILES string. These embeddings are learnt jointly with the rest of the model. The output from the embedding layer is a tensor of shape (batch_size, seq_len, embedding_dim). This tensor can optionally be fed through a 1D convolutional layer, before being passed to a series of RNN cells (optionally bidirectional). The final output from the RNN cells aims to have learnt the temporal dependencies in the SMILES string, and in turn information about the structure of the molecule, which is then used for molecular property prediction.

In the paper, the authors also train an explanation mask to endow the model with interpretability and gain insights into its decision making. This segment is currently not a part of this implementation as this was developed for the purpose of investigating a transfer learning protocol, ChemNet.

Examples

>>> import deepchem as dc
>>> import os
>>> import numpy as np
>>> from deepchem.models.torch_models import Smiles2VecModel
>>> from deepchem.feat import create_char_to_idx, SmilesToSeq
>>> smiles = ["C1CCC1", "C1=CC=CN=C1"]
>>> data_points = len(smiles)
>>> max_seq_len = 20
>>> max_len = 250
>>> pad_len = 10
>>> n_tasks = 5
>>> dataset_file = os.path.join(os.path.dirname(__file__), "tests", "assets", "chembl_25_small.csv")
>>> char_to_idx = create_char_to_idx(dataset_file, max_len=max_len, smiles_field="smiles")
>>> featurizer = SmilesToSeq(char_to_idx=char_to_idx, max_len=max_len, pad_len=pad_len)
>>> X = featurizer.featurize(smiles)
>>> y = np.random.normal(size=(data_points, n_tasks))
>>> dataset = dc.data.NumpyDataset(X=X, y=y)
>>> w = np.ones(shape=(data_points, n_tasks))
>>> dataset = dc.data.NumpyDataset(X[:data_points, :max_seq_len], y, w, dataset.ids[:data_points])
>>> metric = dc.metrics.Metric(dc.metrics.mean_absolute_error, mode="regression")
>>> model = Smiles2VecModel(char_to_idx=char_to_idx, max_seq_len=max_seq_len, use_conv=True, n_tasks=n_tasks, model_dir=None, mode="regression")
>>> loss = model.fit(dataset, nb_epoch=10)

References

Predicting Chemical Properties” (https://arxiv.org/pdf/1712.02034.pdf)

Chemical Property Prediction(https://arxiv.org/abs/1712.02734)

__init__(char_to_idx: Dict, n_tasks: int = 10, max_seq_len: int = 270, embedding_dim: int = 50, n_classes: int = 2, use_bidir: bool = True, use_conv: bool = True, filters: int = 192, kernel_size: int = 3, strides: int = 1, rnn_sizes: List[int] = [224, 384], rnn_types: List[str] = ['GRU', 'GRU'], mode: str = 'regression', device: device | None = None, **kwargs)[source]
Parameters:
  • char_to_idx (dict,) – char_to_idx contains character to index mapping for SMILES characters

  • embedding_dim (int, default 50) – Size of character embeddings used.

  • use_bidir (bool, default True) – Whether to use BiDirectional RNN Cells

  • use_conv (bool, default True) – Whether to use a conv-layer

  • kernel_size (int, default 3) – Kernel size for convolutions

  • filters (int, default 192) – Number of filters

  • strides (int, default 1) – Strides used in convolution

  • rnn_sizes (list[int], default [224, 384]) – Number of hidden units in the RNN cells

  • mode (str, default regression) – Whether to use model for regression or classification

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically

MXMNet

class MXMNet(dim: int, n_layer: int, cutoff: float, num_spherical: int = 7, num_radial: int = 6, envelope_exponent: int = 5, activation_fn: Callable | str = 'silu', n_tasks: int = 1)[source]

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures. In this class, we define the global and local message passing layers and the MXMNet Model[1]_. We also define the forward call of this model in the forward function.

Note: This model currently supports only Single Task Regression. Multi Task Regression, Single-Task Classification and Multi-Task Classification is not yet implemented. .. rubric:: References

Example

>>> import deepchem as dc
>>> import os
>>> import tempfile
>>> import torch
>>> from torch_geometric.data import Data, Batch
>>> from deepchem.feat.molecule_featurizers import MXMNetFeaturizer
>>> QM9_TASKS = ["mu", "alpha", "homo", "lumo", "gap", "r2", "zpve", "cv", "u0", "u298",
...              "h298", "g298"]
>>> dim = 10
>>> n_layer = 6
>>> cutoff = 5
>>> feat = MXMNetFeaturizer()
>>> # Get data
>>> loader = dc.data.SDFLoader(tasks=[QM9_TASKS[0]],
...                            featurizer=feat,
...                            sanitize=True)
>>> current_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
>>> dataset_path = os.path.join(current_dir, "tests/assets/qm9_mini.sdf")
>>> dataset = loader.create_dataset(inputs=dataset_path,
...                            shard_size=1)
>>> model = MXMNet(dim=dim, n_layer=n_layer, cutoff=cutoff)
>>> train_dir = None
>>> if train_dir is None:
...    train_dir = tempfile.mkdtemp()
>>> data = dataset.select([i for i in range(0, 1)], train_dir)
>>> data = data.X
>>> data = [data[i].to_pyg_graph() for i in range(1)]
>>> pyg_batch = Batch()
>>> pyg_batch = pyg_batch.from_data_list(data)
>>> output = model(pyg_batch)
>>> output[0].shape
torch.Size([1])
__init__(dim: int, n_layer: int, cutoff: float, num_spherical: int = 7, num_radial: int = 6, envelope_exponent: int = 5, activation_fn: Callable | str = 'silu', n_tasks: int = 1)[source]

Initialize the MXMNet class. :param dim: The dimensionality of node embeddings. :type dim: int :param n_layer: The number of message passing layers. :type n_layer: int :param cutoff: The distance cutoff for edge connections. :type cutoff: float :param num_spherical: The number of spherical harmonics. :type num_spherical: int, default 7 :param num_radial: The number of radial basis functions. :type num_radial: int, default 6 :param envelope_exponent: The exponent for the envelope function. :type envelope_exponent: int, default 5 :param activation_fn: The activation function name. :type activation_fn: Union[Callable, str], default ‘silu’ :param n_tasks: The number of prediction tasks. Only single Task regression is supported currently. :type n_tasks: int, default 1

init() None[source]

Initialize the node embeddings by setting them to random values within a predefined range. This method ensures that the node embeddings are initialized with random values, promoting diversity in the initial node representations.

indices(edge_index: Tensor, num_nodes: int) Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor][source]

Compute node indices for defining angles in the molecular graph. :param edge_index: The edge index tensor. :type edge_index: torch.Tensor :param num_nodes: The number of nodes in the graph. :type num_nodes: int

Returns:

Node indices for various angle calculations.

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

forward(pyg_batch: Batch) Tensor[source]

Forward pass of the MXMNet model. :param pyg_batch:

A pytorch-geometric batch containing tensors for:
  • node features

  • edge_index

  • pos

  • batch information

Returns:

The model’s output.

Return type:

torch.Tensor

LSTMGenerator

class LSTMGenerator(loss: Loss | _Loss = CrossEntropyLoss(), batch_size: int = 8, embedding_dim: int = 128, hidden_dim: int = 256, num_layers: int = 2, tokenizer: PreTrainedTokenizer | None = None, learning_rate: float = 0.001, optimizer: Optimizer | None = None, model_dir: str = 'lstm_generator_model', device: device | None = None)[source]

LSTM Generator

This class implements an LSTM-based [1]_ generator for token generation tasks. The generator is trained on a list of sequences and can be used to generate new sequences of tokens. The model is implemented using PyTorch and can be useful to generate SMILES, PSMILES and Weighted Graph strings for generation tasks. The generator is used in our research paper “Open-source Polymer Generative Pipeline” [2]_ to generate hypothetical polymers using PSMILES and Weighted Directed Graph representations.

References

Examples

>>> from deepchem.models.torch_models import LSTMGenerator
>>> from deepchem.data import NumpyDataset
>>> generator = LSTMGenerator()
>>> dataset = NumpyDataset(["CCC"])
>>> loss = generator.fit(dataset, nb_epoch=1)
>>> new_sequences = generator.sample(num_gen=10, max_len=10, temperature=1.0)
__init__(loss: Loss | _Loss = CrossEntropyLoss(), batch_size: int = 8, embedding_dim: int = 128, hidden_dim: int = 256, num_layers: int = 2, tokenizer: PreTrainedTokenizer | None = None, learning_rate: float = 0.001, optimizer: Optimizer | None = None, model_dir: str = 'lstm_generator_model', device: device | None = None) None[source]

Initializes the LSTMGenerator model.

Parameters:
  • loss (Loss, default nn.CrossEntropyLoss) – Loss function to use for training the model.

  • batch_size (int, default 8) – Batch size to use during training.

  • embedding_dim (int, default 128) – Dimension of the embedding vector.

  • hidden_dim (int, default 256) – Number of hidden units in each LSTM layer.

  • num_layers (int, default 2) – Number of LSTM layers in the network.

  • tokenizer (PreTrainedTokenizer, default None) – Tokenizer to use for tokenizing input sequences.

  • learning_rate (float, default 0.001) – Learning rate to use during training.

  • optimizer (Optimizer, default None) – Optimizer to use for training the model. If None, Adam optimizer is used.

  • model_dir (str, default "lstm_generator_model") – Directory to save the model files.

  • device (torch.device, default None) – Device to use for training the model.

default_generator(dataset: Dataset, epochs: int = 1, *args, **kwargs) Iterable[Tuple[List, List, List]][source]

Generates a default generator for the input sequences.

Parameters:
  • dataset (Dataset) – Dataset of input sequences to tokenize.

  • epochs (int, default 1) – Number of epochs to train the model.

Returns:

Generator that yields input, target and zero weight as a tuple of tensors for training the model.

Return type:

Iterable

fit(dataset: Dataset, nb_epoch: int = 1, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, *args, **kwargs) float[source]

Fits the model on the input sequences.

Parameters:
  • dataset (Dataset) – Dataset of input sequences to train the model.

  • nb_epoch (int, default 1) – Number of epochs to train the model.

  • max_checkpoints_to_keep (int, default 5) – Maximum number of checkpoints to keep.

  • checkpoint_interval (int, default 1000) – Interval at which to save model checkpoints.

Returns:

Average loss of most recent checkpoint

Return type:

float

fit_generator(generator: Iterable, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, *args, **kwargs) float[source]

Fits the model on the input sequences using a generator.

Parameters:
  • generator (Iterable) – Generator that yields input, target and zero weight tensors for training the model.

  • max_checkpoints_to_keep (int, default 5) – Maximum number of checkpoints to keep.

  • checkpoint_interval (int, default 1000) – Interval at which to save model checkpoints.

Returns:

Average loss of most recent checkpoint

Return type:

float

restore(checkpoint: str | None = None, model_dir: str | None = None, strict: bool | None = True) None[source]

Restore the values of all variables from a checkpoint file.

Parameters:
  • checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.

  • model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If checkpoint is not None, this is ignored.

  • strict (bool, default True) – Whether or not to strictly enforce that the keys in checkpoint match the keys returned by this model’s get_variable_scope() method.

load_from_pretrained(source_model: TorchModel, assignment_map: Dict[Any, Any] | None = None, value_map: Dict[Any, Any] | None = None, checkpoint: str | None = None, model_dir: str | None = None, include_top: bool = True, inputs: Sequence[Any] | None = None, **kwargs: Any) None[source]

Load model from a pretrained checkpoint.

Parameters:
  • source_model (TorchModel) – The source model to load from.

  • assignment_map (dict, optional) – Parameter assignment (unused here).

  • value_map (dict, optional) – Parameter assignment (unused here).

  • checkpoint (str, optional) – Checkpoint file path.

  • model_dir (str, optional) – Directory from which to restore the checkpoint.

  • include_top (bool) – Whether to load the top layers (unused here).

  • inputs (Sequence[Any], optional) – Input placeholders (unused here).

sample(num_gen: int = 100, max_len: int = 600, temperature: float = 1.0) List[str][source]

Generate sequences by repeated sampling. :param num_gen: Number of sequences to generate. :type num_gen: int, default 100 :param max_len: Maximum length of the generated sequence. :type max_len: int, default 600 :param temperature: Temperature to use for sampling. :type temperature: float, default 1.0

Returns:

List of generated sequences as strings.

Return type:

List[str]

InceptionV3Model

class InceptionV3Model(in_channels=6, warmup_steps=10000, learning_rate=0.064, dropout_rate=0.2, decay_rate=0.94, decay_steps=2, rho=0.9, momentum=0.9, epsilon=1.0, **kwargs)[source]

Implementation of the InceptionV3 model architecture for image classification, modified for use with the DeepVariant framework in DeepChem.

It builds on the original Inception design by utilizing a network-in-network approach, where convolutional filters of various sizes (e.g., 1x1, 3x3, 5x5) are applied in parallel within each module. This enables the model to capture features at multiple scales. InceptionV3 has factorized convolutions (breaking down larger convolutions into smaller ones, like 3x3 into 1x3 and 3x1) and the use of auxiliary classifiers that assist the model’s training by acting as regularizers. It uses dimensionality reduction to control the computational complexity.This model supports custom learning rate schedules with warmup and decay steps, utilizing the RMSProp optimizer.

Examples

>>> from deepchem.models.torch_models import InceptionV3Model
>>> import deepchem as dc
>>> import numpy as np
>>> model = InceptionV3Model()
>>> input_shape = (5, 6, 299, 299)
>>> input_samples = np.random.randn(*input_shape).astype(np.float32)
>>> output_samples = np.random.randint(0, 3, (5,)).astype(np.int64)
>>> one_hot_output_samples = np.eye(3)[output_samples]
>>> dataset = dc.data.ImageDataset(input_samples, one_hot_output_samples)
>>> loss = model.fit(dataset, nb_epoch=1)
>>> predictions = model.predict(dataset)
>>> predictions.shape
(5, 3)
__init__(in_channels=6, warmup_steps=10000, learning_rate=0.064, dropout_rate=0.2, decay_rate=0.94, decay_steps=2, rho=0.9, momentum=0.9, epsilon=1.0, **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

adjust_learning_rate()[source]

Adjusts learning rate manually based on warmup and decay steps.

fit(dataset: Dataset, nb_epoch: int = 10, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, deterministic: bool = False, restore: bool = False, variables: List[Parameter] | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Trains the model on the given dataset, adjusting learning rate with warmup and decay.

Parameters:
  • dataset (Dataset) – Dataset to be used for training.

  • nb_epoch (int, optional (default 1)) – Number of epochs to train the model.

  • max_checkpoints_to_keep (int, optional) – Number of checkpoints to keep.

  • checkpoint_interval (int, optional) – Interval for saving checkpoints.

  • deterministic (bool, optional) – If True, runs in deterministic mode.

  • restore (bool, optional) – If True, restores the model from the last checkpoint.

  • variables (list, optional) – List of parameters to train.

  • loss (callable, optional) – Custom loss function.

  • callbacks (callable or list of callables, optional) – Callbacks to run during training.

  • all_losses (list of floats, optional) – List to store all losses during training.

Returns:

The final loss value after training.

Return type:

float

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

MobileNetV2Model

class MobileNetV2Model(n_tasks: int, in_channels: int = 6, input_size: int = 224, mode: str = 'classification', n_classes: int = 2, width_mult: float = 1.0, **kwargs)[source]

MobileNetV2 with multi-channel support and classification/regression modes.

MobileNetV2 is a lightweight and efficient convolutional neural network architecture designed for mobile and edge devices. It builds on the success of depthwise separable convolutions introduced in MobileNetV1, but introduces two key innovations: inverted residual blocks and linear bottlenecks. Unlike traditional residual blocks that compress then expand features, MobileNetV2 first expands the input channels, applies a depthwise convolution (which processes each channel independently), and then projects it back down to a lower-dimensional space using a pointwise (1x1) convolution. This “inverted” structure preserves rich information in the high-dimensional space while maintaining efficiency. A linear layer (without non-linearity) at the bottleneck helps retain feature information during compression. Residual connections are selectively added when the input and output shapes match, which stabilizes training and improves accuracy.

Parameters:
  • n_tasks (int) – Number of tasks (output dimensions)

  • in_channels (int, default 6) – Number of input channels

  • input_size (int, default 224) – Input image size (must be divisible by 32)

  • mode (str, default "classification") – Either “regression” or “classification”

  • n_classes (int, default 2) – Number of classes to predict (only used in classification mode)

  • width_mult (float, default 1.0) – Width multiplier for the network

References

__init__(n_tasks: int, in_channels: int = 6, input_size: int = 224, mode: str = 'classification', n_classes: int = 2, width_mult: float = 1.0, **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

MultitaskIRVClassifier

class MultitaskIRVClassifier(n_tasks: int, K=10, penalty=0.0, mode='classification', device: device | None = None, **kwargs)[source]

Implements Influence Relevance Voter (IRV), a novel machine learning model designed for virtual high-throughput screening (vHTS).vHTS predicts the biological activity of chemical compounds sing computational methods, reducing the need for expensive experimental screening.

The IRV model extends the k-Nearest Neighbors (kNN) algorithm by improving how neighbors influence predictions. Instead of treating all neighbors equally, IRV assigns each neighbor a relevance score based on its similarity to the query compound.This similarity is calculated using molecular fingerprint comparisons between the query compound and its neighbors,allowing more relevant neighbors to have a greater impact on the prediction.

This model has been benchmarked on HIV dataset from IJCNN-07 Competition organised in 2007 and DHFR dataset from McMaster University Data-Mining and Docking Competition organised in 2005 in [1].

Example

>>> import deepchem as dc
>>> import numpy as np
>>> n_tasks = 5
>>> n_samples = 10
>>> n_features = 128
>>> K=5
>>> # Generate dummy dataset.
>>> ids = np.arange(n_samples)
>>> # Features in ECFP Fingerprints representation
>>> X = np.random.randint(2, size=(n_samples, n_features))
>>> # Labels for tasks.
>>> # Either 1 or 0 depending on whether the samples are active or inactive in that application(task)
>>> y = np.ones((n_samples, n_tasks))
>>> # Weights for each task in each column.
>>> w = np.ones((n_samples, n_tasks))
>>> dataset = dc.data.NumpyDataset(X, y, w, ids)
>>> # Transforms ECFP Fingerprints to IRV features(Similarity values of top K nearest neighbors).
>>> # Initialize the IRVTransformer with the reference dataset.
>>> IRV_transformer = dc.trans.IRVTransformer(K, n_tasks, dataset)
>>> # Apply the IRVTransformer.transform() to the target dataset for which the prediction is needed.
>>> # Calculates the similrity values of the samples in target dataset with the reference dataset
>>> # and returns the values of top K similar samples in reference dataset for each sample in target dataset.
>>> dataset_trans = IRV_transformer.transform(dataset)
>>> # Instantiate the model
>>> model = dc.models.torch_models.MultitaskIRVClassifier(n_tasks, K = 5, learning_rate = 0.01, batch_size = n_samples)
>>> # Train the model
>>> loss = model.fit(dataset_trans)
>>> # Prediction
>>> output = model.predict(dataset_trans)
>>> # Evaluation
>>> classification_metric = dc.metrics.Metric(dc.metrics.accuracy_score,task_averager=np.mean)
>>> score = model.evaluate(dataset_trans, [classification_metric])

References:

[1] .. S.J.Swamidass et al, “The Influence Relevance Voter: An Accurate and Interpretable Virtual High Throughput Screening Method. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2750043/

__init__(n_tasks: int, K=10, penalty=0.0, mode='classification', device: device | None = None, **kwargs)[source]

Initialize MultitaskIRVClassifier

Parameters:
  • n_tasks (int) – Number of tasks

  • K (int) – Number of nearest neighbours used in classification

  • penalty (float) – Amount of penalty (l2 or l1 applied)

HNN

class HNN(d_input: int = 2, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'tanh')[source]
Model for learning hamiltonian dynamics using Hamiltonian Neural

Network.

Hamiltonian Neural Networks (HNNs) are a class of physics-informed models that learn the underlying Hamiltonian function of a dynamical system directly from data. Instead of predicting derivatives directly, an HNN learns a scalar-valued Hamiltonian function H(q, p) and computes time evolution using Hamilton’s equations.

Parameters:
  • d_input (int, default 2) – This are pairs of (q, p) values.

  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions for the multilayer perceptron that approximates the Hamiltonian function.

  • activation_fn (str, default 'tanh') – Activation function to use in the hidden layers.

Examples

>>> import deepchem as dc
>>> from deepchem.models.torch_models.hnn import HNN
>>> import torch
>>> hnn = HNN(d_input=2, d_hidden=(64, 64), activation_fn='tanh')
>>> # Phase space coordinates [q1, q2, p1, p2]
>>> z = torch.randn(10, 2, requires_grad=True)
>>> # Get Hamiltonian value
>>> _ = hnn.eval()
>>> H = hnn(z)  # Shape: (10,)
>>> # Get time derivatives for training
>>> _ = hnn.train()
>>> dz_dt = hnn(z)  # Shape: (10, 2)

References

__init__(d_input: int = 2, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'tanh') None[source]

Initialize the Hamiltonian Neural Network.

Parameters:
  • d_input (int, default 2) – Dimensionality of the input phase space. Should be even.

  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions for the MLP.

  • activation_fn (str, default 'tanh') – Activation function for hidden layers.

forward(z: Tensor) Tensor[source]

Forward pass through the HNN.

The behavior depends on the training mode: - Training mode: Returns symplectic gradient (time derivatives) - Evaluation mode: Returns Hamiltonian value

Parameters:

z (torch.Tensor) – Phase space coordinates of shape (…, d_input) where the first d_input//2 dimensions are coordinates q and the last d_input//2 dimensions are momentum p.

Returns:

If training: Time derivatives dz/dt of shape (…, d_input) If evaluation: Hamiltonian values H(z) of shape (…,)

Return type:

torch.Tensor

Examples

>>> hnn = HNN(d_input=2)
>>> z = torch.randn(5, 2, requires_grad=True)
>>> _ = hnn.train()
>>> dz_dt = hnn(z)  # Shape: (5, 4)
>>> _ = hnn.eval()
>>> H = hnn(z)  # Shape: (5,)
hamiltonian(z: Tensor) Tensor[source]

Compute the Hamiltonian function H(q, p).

This method directly evaluates the learned Hamiltonian function without considering the training mode. It always returns the Hamiltonian value. Total energy = Potential energy + Kinetic energy

Parameters:

z (torch.Tensor) – Phase space coordinates of shape (…, d_input).

Returns:

Hamiltonian values H(z) of shape (…,).

Return type:

torch.Tensor

Examples

>>> hnn = HNN(d_input=2)
>>> z = torch.randn(3, 2)
>>> H = hnn.hamiltonian(z)  # Shape: (3,)
symplectic_gradient(z: Tensor) Tensor[source]

Compute the symplectic gradient using Hamilton’s equations.

This method computes the time derivatives of the phase space coordinates using Hamilton’s equations:

The gradients are computed using automatic differentiation to ensure exact computation of the partial derivatives. dq/dt = ∂H/∂p dp/dt = -∂H/∂q

Parameters:

z (torch.Tensor) – Phase space coordinates of shape (…, d_input) where the first d_input//2 dimensions are coordinates q and the last d_input//2 dimensions are momenta p. Must have requires_grad=True or will be set automatically.

Returns:

Time derivatives dz/dt of shape (…, d_input) where the first d_input//2 dimensions are dq/dt and the last d_input//2 dimensions are dp/dt.

Return type:

torch.Tensor

Examples

>>> hnn = HNN(d_input=2)
>>> z = torch.randn(4, 2, requires_grad=True)
>>> dz_dt = hnn.symplectic_gradient(z)  # Shape: (4, 2)
>>> dq_dt = dz_dt[..., :1]
>>> dq_dt.shape
torch.Size([4, 1])
>>> dp_dt = dz_dt[..., 1:]
>>> dp_dt.shape
torch.Size([4, 1])

Notes

The symplectic structure is preserved by construction through Hamilton equations. This ensures that the learned dynamics conserve energy and maintain the geometric properties of Hamiltonian systems.

HNNModel

class HNNModel(d_input: int = 2, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'tanh', **kwargs)[source]

Hamiltonian Neural Network wrapper model which inherits TorchModel.

This class wraps the HNN base model and provides a DeepChem-compatible interface for training and evaluation using conservative dynamics. The HNNModel computes the time evolution of a dynamical system by learning the Hamiltonian and using its gradients to derive time derivatives of the phase space variables.

Parameters:
  • d_input (int, default 2) – Dimension of phase space. Must be even for [q, p] coordinates.

  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions.

  • activation_fn (str, default 'tanh') – Activation function name.

Examples

>>> import numpy as np
>>> import deepchem as dc
>>> from deepchem.models.torch_models import HNNModel
>>> x = np.random.randn(100, 2).astype(np.float32)
>>> dx = np.random.randn(100, 2).astype(np.float32)
>>> dataset = dc.data.NumpyDataset(x, dx)
>>> model = HNNModel(batch_size=32)
>>> _ = model.fit(dataset, nb_epoch=100)
>>> preds = model.predict_on_batch(x)
>>> hamiltonians = model.predict_hamiltonian(x)

References

__init__(d_input: int = 2, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'tanh', **kwargs) None[source]

Initialize HNNModel.

predict_hamiltonian(X: ndarray) ndarray[source]

Compute Hamiltonian energy values H(q, p).

Parameters:

X (np.ndarray) – A NumPy array of phase space coordinates with shape (n_samples, d_input), where each row corresponds to a point in the phase space [q, p].

Returns:

Hamiltonian values of shape (n_samples,).

Return type:

np.ndarray

symplectic_gradient(z: Tensor) Tensor[source]

Compute symplectic gradient using Hamilton’s equations.

Parameters:

z (torch.Tensor) – Phase space coordinates (q, p)

Returns:

Time derivatives of shape (batch_size, d_input).

Return type:

torch.Tensor

FNO

class FNO(in_channels: int, out_channels: int, modes: int | Tuple[int, ...], width: int, dims: int, depth: int = 4, positional_encoding: bool = False)[source]

Base implementation of Fourier Neural Operator, inheriting from the Torch nn.Module class.

Fourier Neural Operator (FNO) is a neural network architecture for learning mappings between function spaces. It uses spectral convolutions in Fourier space to capture global dependencies efficiently, making it particularly effective for solving partial differential equations (PDEs).

The architecture consists of: 1. Lifting layer: Maps input to higher-dimensional representation 2. Multiple FNO blocks: Perform spectral and local convolutions 3. Projection layers: Map back to output space

References

This technique was introduced in Li, Zongyi, et al. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020).

Example

>>> import torch
>>> from deepchem.models.torch_models.fno import FNO
>>> model = FNO(in_channels=1, out_channels=1, modes=8, width=32, dims=2)
>>> x = torch.randn(1, 16, 16, 1)
>>> output = model(x)
__init__(in_channels: int, out_channels: int, modes: int | Tuple[int, ...], width: int, dims: int, depth: int = 4, positional_encoding: bool = False) None[source]

Initialize the FNO base model.

Parameters:
  • in_channels (int) – Dimension of input features

  • out_channels (int) – Dimension of output features

  • modes (int or tuple) – Number of Fourier modes to keep in spectral convolution

  • width (int) – Width of the hidden layers

  • dims (int) – Spatial dimensionality (1, 2, or 3)

  • depth (int, default 4) – Number of FNO blocks to stack

  • positional_encoding (bool, default False) – When enabled, uses meshgrids as positional encodings. If custom positional encodings, must be set to False.

forward(x: Tensor) Tensor[source]

Forward pass through the FNO model.

Parameters:

x (torch.Tensor) – Input tensor

Returns:

Output tensor

Return type:

torch.Tensor

FNOModel

class FNOModel(in_channels: int, out_channels: int, modes: int | Tuple[int, ...], width: int, dims: int, depth: int = 4, positional_encoding: bool = False, loss: Module = MSELoss(), **kwargs)[source]

Fourier Neural Operator for learning mappings between function spaces.

This is a TorchModel wrapper around the nn.Module FNO class that provides the DeepChem interface for training and prediction. FNO is particularly effective for solving partial differential equations (PDEs) and learning operators between infinite-dimensional function spaces.

The model uses spectral convolutions in Fourier space to capture global dependencies efficiently, making it much more parameter-efficient than traditional convolutional neural networks for PDE solving tasks.

References

This technique was introduced in Li, Zongyi, et al. “Fourier neural operator for parametric partial differential equations.” arXiv preprint arXiv:2010.08895 (2020).

Example

>>> import torch
>>> import deepchem as dc
>>> from deepchem.models.torch_models.fno import FNOModel
>>> x = torch.randn(1, 16, 16, 1)
>>> dataset = dc.data.NumpyDataset(X=x, y=x)
>>> model = FNOModel(in_channels=1, out_channels=1, modes=8, width=32, dims=2)
>>> loss = model.fit(dataset)
>>> predictions = model.predict(dataset)
__init__(in_channels: int, out_channels: int, modes: int | Tuple[int, ...], width: int, dims: int, depth: int = 4, positional_encoding: bool = False, loss: Module = MSELoss(), **kwargs) None[source]

Initialize the FNO model. :param in_channels: Dimension of input features at each spatial location :type in_channels: int :param out_channels: Dimension of output features at each spatial location :type out_channels: int :param modes: Number of Fourier modes to keep in spectral convolution. Higher values

capture more high-frequency information but increase computational cost

Parameters:
  • width (int) – Width of the hidden layers in the FNO blocks. Controls model capacity

  • dims (int) – Spatial dimensionality of the input data (1, 2, or 3)

  • depth (int, default 4) – Number of FNO blocks to stack. More blocks can learn more complex mappings

  • positional_encoding (bool, default False) – When enabled, uses meshgrids as positional encodings

  • loss (Union[Loss, LossFn], default nn.MSELoss()) – Loss function to use for training

  • **kwargs (dict) – Additional arguments passed to TorchModel constructor

LNN

class LNN(n_dof: int, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'softplus')[source]
Model for learning lagrangian dynamics using Lagrangian Neural

Network.

Lagrangian Neural Networks (LNNs) are a class of physics-informed models that learn the underlying Lagrangian function of a dynamical system directly from data. Instead of predicting derivatives directly, an LNN learns a scalar-valued Lagrangian function L(q, q_dot) and computes time evolution using Euler-Lagrange equations.

Parameters:
  • n_dof (int) – Number of degrees of freedom in the system. The input dimension will be 2*n_dof (positions + velocities).

  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions for the multilayer perceptron that approximates the Lagrangian function.

  • activation_fn (str, default 'softplus') – Activation function to use in the hidden layers. Softplus is preferred for Lagrangian learning as it ensures smooth derivatives.

Examples

>>> import deepchem as dc
>>> from deepchem.models.torch_models.lnn import LNN
>>> import torch
>>> lnn = LNN(n_dof=2, d_hidden=(64, 64), activation_fn='softplus')
>>> z = torch.randn(10, 4, requires_grad=True)
>>> # Get Lagrangian value
>>> _ = lnn.eval()
>>> L = lnn.lagrangian(z)  # Shape: (10,)
>>> # Get accelerations for training
>>> _ = lnn.train()
>>> q_ddot = lnn(z)  # Shape: (10, 2)

References

__init__(n_dof: int, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'softplus') None[source]

Initialize the Lagrangian Neural Network.

Parameters:
  • n_dof (int) – Number of degrees of freedom in the system. The input dimension will be 2*n_dof (positions + velocities).

  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions for the multilayer perceptron that approximates the Lagrangian function.

  • activation_fn (str, default 'softplus') – Activation function to use in the hidden layers. Softplus is preferred for Lagrangian learning as it ensures smooth derivatives.

forward(z: Tensor) Tensor[source]

Forward pass through the LNN. The behavior depends on the training mode: - Training mode: Returns accelerations computed via Euler-Lagrange equations - Evaluation mode: Returns accelerations computed via Euler-Lagrange equations

Parameters:

z (torch.Tensor) – State space coordinates of shape (…, 2*n_dof) where the first n_dof dimensions are positions q and the last n_dof dimensions are velocities q_dot.

Returns:

Accelerations q̈ of shape (…, n_dof) computed by solving the Euler-Lagrange equations: d/dt(∂L/∂q̇) - ∂L/∂q = 0

Return type:

torch.Tensor

Examples

>>> lnn = LNN(n_dof=2)
>>> z = torch.randn(5, 4, requires_grad=True)  # Shape: (5, 4)
>>> q_ddot = lnn(z) # Shape: (5, 2)
calculate_dynamics(z: Tensor) Tensor[source]

Compute accelerations using Euler-Lagrange equations from learned Lagrangian.

This method implements the core physics computation by: 1. Computing first and second derivatives of the learned Lagrangian L(q, q_dot) 2. Extracting the required partial derivatives for Euler-Lagrange equations 3. Applying Euler Lagrange equation to calculate accelerations

Parameters:

x (torch.Tensor) – State space coordinates of shape (…, 2*n_dof) where the first n_dof dimensions are positions q and the last n_dof dimensions are velocities q_dot.

Returns:

Accelerations of shape (…, n_dof) computed by solving the Euler-Lagrange equations.

Return type:

torch.Tensor

Examples

>>> lnn = LNN(n_dof=2)
>>> x = torch.randn(10, 4, requires_grad=True)  # Shape: (10, 4)
>>> accelerations = lnn.calculate_dynamics(x)  # Shape: (10, 2)
lagrangian(z: Tensor) Tensor[source]

Compute the learned Lagrangian function L(q, q_dot) for given state.

The Lagrangian is a scalar function that encodes the dynamics of the system. In classical mechanics, L = T - V (kinetic energy - potential energy). This method returns the neural network’s approximation of this function.

Parameters:

z (torch.Tensor) – State space coordinates of shape (…, 2*n_dof) where the first n_dof dimensions are positions q and the last n_dof dimensions are velocities q_dot.

Returns:

Lagrangian values L(q, q_dot) of shape (…,) - one scalar value per input state configuration.

Return type:

torch.Tensor

Notes

The Lagrangian function is the fundamental quantity from which all dynamics are derived via the Euler-Lagrange equations. The neural network learns to approximate this function such that the resulting dynamics match the training data.

Examples

>>> lnn = LNN(n_dof=2)
>>> z = torch.randn(8, 4)  # Shape: (8, 4)
>>> L_values = lnn.lagrangian(z)  # Shape: (8,)

LNNModel

class LNNModel(n_dof: int, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'softplus', **kwargs)[source]

Lagrangian Neural Network wrapper model which inherits TorchModel. This class wraps the LNN base model and provides a DeepChem-compatible interface for training and evaluation using conservative dynamics. The LNNModel computes the time evolution of a dynamical system by learning the euler-lagrangian and using its gradients to derive time derivatives of the phase space variables. :param n_dof: Number of degrees of freedom in the system. The input dimension

will be 2*n_dof (positions + velocities).

Parameters:
  • d_hidden (Tuple[int, ...], default (32, 32)) – Hidden layer dimensions for the multilayer perceptron that approximates the Lagrangian function.

  • activation_fn (str, default 'softplus') – Activation function to use in the hidden layers. Softplus is preferred for Lagrangian learning as it ensures smooth derivatives.

Examples

>>> import deepchem as dc
>>> from deepchem.models.torch_models.lnn import LNNModel
>>> import torch
>>> model = LNNModel(n_dof=2, d_hidden=(64, 64), activation_fn='softplus')
>>> # input values for spring-pendulum
>>> x = torch.randn(10, 4) # Shape : (10, 4)
>>> dx = torch.randn(10, 2) # Shape : (10, 2)
>>> dataset = dc.data.NumpyDataset(x, dx)
>>> _ = model.fit(dataset, nb_epoch=10)
>>> # predicting values with batches
>>> _ = model.predict_on_batch(x) # shape : (10, 2)

References

__init__(n_dof: int, d_hidden: Tuple[int, ...] = (32, 32), activation_fn: str = 'softplus', **kwargs) None[source]

Initialize LNNModel.

predict_lagrangian(z: Tensor) Tensor[source]

Compute lagrangian forward pass with input z as (q, q_dot) :param z: State space coordinates of shape (…, 2*n_dof) where the first

n_dof dimensions are positions q and the last n_dof dimensions are velocities q_dot.

Returns:

Lagrangian values L(q, q_dot) of shape (…,) - one scalar value per input state configuration.

Return type:

torch.Tensor

calculate_dynamics(z: Tensor) Tensor[source]

Compute accelerations using Euler-Lagrange equations from learned Lagrangian. :param z: State space coordinates of shape (…, 2*n_dof) where the first

n_dof dimensions are positions q and the last n_dof dimensions are velocities q_dot.

Returns:

Lagrangian values L(q, q_dot) of shape (…,) - one scalar value per input state configuration.

Return type:

torch.Tensor

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

SE3TransformerModel

class SE3TransformerModel(num_layers: int, atom_feature_size: int, num_workers: int, num_channels: int, num_nlayers: int = 1, num_degrees: int = 4, edge_dim: int = 4, pooling: str = 'avg', n_heads: int = 1, device: device | None = device(type='cpu'), **kwargs)[source]

SE3TransformerModel Deepchem Wrapper.

This class wraps the SE3Transformer model for compatibility with DeepChem’s model interface.

Parameters:
  • num_layers (int) – Number of SE(3) Attention layers.

  • atom_feature_size (int) – Dimensionality of atom features.

  • num_workers (int) – Number of DGL / DataLoader worker processes used for batching and featurization.

  • num_channels (int) – Number of channels for the hidden layers.

  • num_nlayers (int, optional (default=1)) – Number of layers for the residual attention blocks.

  • num_degrees (int, optional (default=4)) – Degree of SE(3)-equivariant features.

  • edge_dim (int, optional (default=4)) – Dimensionality of edge features.

  • pooling (str, optional (default='avg')) – Pooling type: ‘avg’ for average pooling, ‘max’ for max pooling.

  • n_heads (int, optional (default=1)) – Number of attention heads.

  • device (torch.device, optional) – The device (CPU or GPU) on which the model will run.

  • **kwargs – Additional arguments for TorchModel.

Example

>>> import torch
>>> from deepchem.models.torch_models import SE3TransformerModel
>>> import dgl
>>> import rdkit
>>> from rdkit import Chem
>>> import numpy as np
>>> import deepchem as dc
>>> import shutil
>>> import os
>>> smiles = ["CCO", "CC(=O)O", "C1=CC=CC=C1",]
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> mols_g = [featurizer.featurize(Chem.MolFromSmiles(mol))[0] for mol in smiles]
>>> # Extract SE(3)-equivariant features
>>> labels = np.random.rand(len(mols_g), 1)
>>> weights = np.ones_like(labels)
>>> dataset = dc.data.NumpyDataset(X=mols_g, y=labels, w=weights)
>>> model = SE3TransformerModel(
...    num_layers=7,
...    atom_feature_size=6,
...    num_workers=4,
...    num_channels=32,
...    num_nlayers=1,
...    num_degrees=4,
...    edge_dim=4,
...    pooling='max',
...    n_heads=8,
...    batch_size=12,)
>>> loss = model.fit(dataset, nb_epoch=1)
>>> dir_path = "cache"
>>> if os.path.exists(dir_path) and os.path.isdir(dir_path):
...     shutil.rmtree(dir_path)
__init__(num_layers: int, atom_feature_size: int, num_workers: int, num_channels: int, num_nlayers: int = 1, num_degrees: int = 4, edge_dim: int = 4, pooling: str = 'avg', n_heads: int = 1, device: device | None = device(type='cpu'), **kwargs) None[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

TFNModel

class TFNModel(num_layers: int, atom_feature_size: int, num_channels: int, num_nlayers: int = 1, num_degrees: int = 4, edge_dim: int = 4, device: device | None = device(type='cpu'), **kwargs)[source]

TFNModel Deepchem Wrapper.

This class wraps the TFN model for compatibility with DeepChem’s model interface.

Parameters:
  • num_layers (int) – Number of SE(3) graph convolution layers.

  • atom_feature_size (int) – Dimensionality of atom features.

  • num_channels (int) – Number of channels for the hidden layers.

  • num_nlayers (int, optional (default=1)) – Number of normalization layers for graph convolution block.

  • num_degrees (int, optional (default=4)) – Degree of SE(3)-equivariant features.

  • edge_dim (int, optional (default=4)) – Dimensionality of edge features.

  • device (torch.device, optional) – The device (CPU or GPU) on which the model will run.

  • **kwargs – Additional arguments for TorchModel.

Example

>>> import torch
>>> from deepchem.models.torch_models import TFNModel
>>> import dgl
>>> import rdkit
>>> from rdkit import Chem
>>> import numpy as np
>>> import deepchem as dc
>>> import shutil
>>> import os
>>> smiles = ["CCO", "CC(=O)O", "C1=CC=CC=C1",]
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> mols_g = [featurizer.featurize(Chem.MolFromSmiles(mol))[0] for mol in smiles]
>>> # Extract SE(3)-equivariant features
>>> labels = np.random.rand(len(mols_g), 12)
>>> weights = np.ones_like(labels)
>>> dataset = dc.data.NumpyDataset(X=mols_g, y=labels, w=weights)
>>> model = TFNModel(
...    num_layers=7,
...    atom_feature_size=6,
...    num_channels=32,
...    num_nlayers=1,
...    num_degrees=4,
...    edge_dim=4,
...    batch_size=12,)
>>> loss = model.fit(dataset, nb_epoch=1)
>>> dir_path = "cache"
>>> if os.path.exists(dir_path) and os.path.isdir(dir_path):
...     shutil.rmtree(dir_path)
__init__(num_layers: int, atom_feature_size: int, num_channels: int, num_nlayers: int = 1, num_degrees: int = 4, edge_dim: int = 4, device: device | None = device(type='cpu'), **kwargs) None[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

save()[source]

Saves model to disk using joblib.

reload()[source]

Loads model from joblib file on disk.

ChemCeptionLayer

class ChemCeptionLayer(stem: Module, inception_resnet_A: Module, reduction_A: Module, inception_resnet_B: Module, reduction_B: Module, inception_resnet_C: Module, global_avg_pool: Module, output_layer: Module, mode: str = 'classification', n_tasks: int = 10, n_classes: int = 2, **kwargs)[source]

Note: This is an internal class intended for use exclusively by the ‘ChemCeption’ class and is not designed to be used directly by end users. It assumes that all required ChemCeption components have already been correctly constructed and passed to it. Using this class with missing or incorrectly configured components may result in errors during the forward pass. It is strongly recommended to use ‘ChemCeption’, which provides safer defaults and finer control over model behavior, such as loading, freezing, and unfreezing specific layers.

This class implements the ChemCeption model that leverages the representational capacities of convolutional neural networks (CNNs) to predict molecular properties.

The model is based on the description in Goh et al., “Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models” (https://arxiv.org/pdf/1706.06689.pdf). The authors use an image based representation of the molecule, where pixels encode different atomic and bond properties. More details on the image repres- entations can be found at https://arxiv.org/abs/1710.02238

The model consists of a Stem Layer that reduces the image resolution for the layers to follow. The output of the Stem Layer is followed by a series of Inception-Resnet blocks & a Reduction layer. Layers in the Inception-Resnet blocks process image tensors at multiple resolutions and use a ResNet style skip-connection, combining features from different resolutions. The Reduction layers reduce the spatial extent of the image by max-pooling and 2-strided convolutions. More details on these layers can be found in the ChemCeption paper referenced above. The output of the final Reduction layer is subject to a Global Average Pooling, and a fully-connected layer maps the features to downstream outputs.

In the ChemCeption paper, the authors perform real-time image augmentation by rotating images between 0 to 180 degrees. This can be done during model training by setting the augment argument to True.

Example

>>> import torch.nn as nn
>>> import deepchem as dc
>>> from deepchem.models.torch_models.chemnet_layers import Stem, InceptionResnetA, InceptionResnetB, InceptionResnetC, ReductionA, ReductionB
>>> from deepchem.models.torch_models import ChemCeptionLayer
>>> DEFAULT_INCEPTION_BLOCKS = {"A": 3, "B": 3, "C": 3}
>>> base_filters = 16
>>> img_spec = 'std'
>>> img_size = 80
>>> n_tasks = 10
>>> n_classes = 2
>>> in_channels = 1 if img_spec == "std" else 4
>>> mode = 'classification'
>>> components = {}
>>> components['stem'] = Stem(in_channels=in_channels, out_channels=base_filters)
>>> components['inception_resnet_A'] = nn.Sequential(*[InceptionResnetA(base_filters, base_filters) for _ in range(DEFAULT_INCEPTION_BLOCKS['A'])])
>>> components['reduction_A'] = ReductionA(base_filters, base_filters)
>>> components['inception_resnet_B'] = nn.Sequential(*[InceptionResnetB(4*base_filters, base_filters) for _ in range(DEFAULT_INCEPTION_BLOCKS['B'])])
>>> components['reduction_B'] = ReductionB(4 * base_filters, base_filters)
>>> current_channels = int(torch.floor(torch.tensor(7.875 * base_filters)).item())
>>> components['inception_resnet_C'] = nn.Sequential(*[InceptionResnetC(current_channels, base_filters) for _ in range(DEFAULT_INCEPTION_BLOCKS['C'])])
>>> components['global_avg_pool'] = nn.AdaptiveAvgPool2d(1)
>>> if mode == "classification":
...     components['fc_classification'] = nn.Linear(current_channels, n_tasks * n_classes)
... else:
...     components['fc_regression'] = nn.Linear(current_channels,n_tasks)
>>> smiles = ['CC(=O)OC1=CC=CC=C1C(=O)O']
>>> featurizer = dc.feat.SmilesToImage(img_size=img_size, img_spec='std')
>>> images = featurizer.featurize(smiles)
>>> image = torch.tensor(images, dtype=torch.float32)
>>> if mode == 'classification':
...        output_layer = components['fc_classification']
... else:
...        output_layer = components['fc_regression']
>>> input = image.permute(0, 3, 1, 2) # to convert from channel last  (N,H,W,C) to pytorch default channel first (N,C,H,W) representation
>>> model = ChemCeptionLayer(stem=components['stem'],
...                       inception_resnet_A=components['inception_resnet_A'],
...                       reduction_A=components['reduction_A'],
...                       inception_resnet_B=components['inception_resnet_B'],
...                       reduction_B=components['reduction_B'],
...                       inception_resnet_C=components['inception_resnet_C'],
...                       global_avg_pool=components['global_avg_pool'],
...                       output_layer=output_layer,
...                       mode=mode,
...                       n_tasks=n_tasks,
...                       n_classes=n_classes)
>>> output = model(input)

References

__init__(stem: Module, inception_resnet_A: Module, reduction_A: Module, inception_resnet_B: Module, reduction_B: Module, inception_resnet_C: Module, global_avg_pool: Module, output_layer: Module, mode: str = 'classification', n_tasks: int = 10, n_classes: int = 2, **kwargs) None[source]
Parameters:
  • stem (nn.Module) – Stem layer that serves as the initial processing block in ChemCeption.

  • inception_resnet_A (nn.Module) – Inception-ResNet-A block from the Inception-ResNet architecture.

  • reduction_A (nn.Module) – Reduction-A block from the Inception-ResNet architecture.

  • inception_resnet_B (nn.Module) – Inception-ResNet-B block from the Inception-ResNet architecture.

  • reduction_B (nn.Module) – Reduction-B block from the Inception-ResNet architecture.

  • inception_resnet_C – Inception-ResNet-C block from the Inception-ResNet architecture.

  • global_avg_pool (nn.Module) – 2D Average Pooling layer

  • output_layer (nn.Module) – A fully connected layer for regression/classification task

  • mode (str, default regression) – The model type, ‘classification’ or ‘regression’.

  • n_tasks (int, default 10) – Number of classification or regression tasks

  • n_classes (int, default 2) – Number of classes (used only for classification)

forward(x: Tensor) Tensor | Sequence[Tensor][source]

Execute a forward pass through the complete Chemception model

Parameters:

x (torch.Tensor) – Input images of shape (n_images, channels, height, width)

Returns:

Output predictions corresponding to the provided images. Shape of regression output is (n_images, n_tasks, 1) and classification output is (n_images, n_tasks, 2).

Return type:

OneOrMany[torch.Tensor]

ChemCeption

class ChemCeption(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict[str, int] | None = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: Literal['regression', 'classification'] = 'classification', **kwargs)[source]

Modular wrapper around ChemCeption for flexible pretraining and finetuning. This class provides a ModularTorchModel interface for ChemCeption. It allows building and training ChemCeption with configurable inception blocks and modes (regression or classification).

Parameters:
  • img_spec (str, default='std') – Image specification, determines input channels. ‘std’ → 1 channel, otherwise 4 channels.

  • img_size (int, default=80) – Size of the input image (height and width).

  • base_filters (int, default=16) – Number of filters

  • inception_blocks (dict, optional) – Dictionary controlling the number of Inception-ResNet blocks per stage. Example: {"A": 3, "B": 3, "C": 3}.

  • n_tasks (int, default=10) – Number of prediction tasks.

  • n_classes (int, optional, default=2) – Number of output classes per task (classification only).

  • augment (bool, default=False) – If True, enable real-time image augmentation during training.

  • mode ({'regression', 'classification'}, default='classification') – Determines whether the model outputs regression values or class probabilities.

  • **kwargs (dict) – Additional keyword arguments passed to ModularTorchModel.

Examples

Pretraining and finetuning workflow:

>>> import numpy as np
>>> import deepchem as dc
>>> from deepchem.feat import SmilesToImage
>>> from deepchem.models.torch_models.chemception import ChemCeption
>>> n_samples = 6
>>> img_size = 80
>>> n_tasks = 10
>>> n_classes = 2
>>> smiles_list = ["CCO", "CC(=O)O", "c1ccccc1", "CCN", "C1CCCCC1", "O=C=O"]
>>> y_pretrain = np.random.randint(0, n_classes, (n_samples, n_tasks)).astype(np.float32)
>>> y_finetune = np.random.randint(0, n_classes, (n_samples, n_tasks)).astype(np.float32)
>>> featurizer = SmilesToImage(img_size=img_size, img_spec='std')
>>> X_images = featurizer.featurize(smiles_list)
>>> X_images = np.array([img.squeeze() for img in X_images])[:, np.newaxis, :, :]
>>> dataset_pt = dc.data.NumpyDataset(X_images, y_pretrain)
>>> dataset_ft = dc.data.NumpyDataset(X_images, y_finetune)
>>> pretrain_model = ChemCeption(
...     img_size=img_size,
...     n_tasks=n_tasks,
...     n_classes=n_classes,
...     mode='classification',
...     learning_rate=1e-4,
... )
>>> pretrain_loss = pretrain_model.fit(dataset_pt, nb_epoch=2)
>>> pretrain_model.save_checkpoint()
>>> finetune_model = ChemCeption(
...     img_size=img_size,
...     n_tasks=n_tasks,
...     n_classes=n_classes,
...     mode='regression',
...     learning_rate=1e-4,
... )
>>> finetune_model.load_from_pretrained(source_model=pretrain_model,
...                                 components=[
...                                  'stem', 'inception_resnet_A', 'inception_resnet_B',
...                                  'inception_resnet_C', 'reduction_A', 'reduction_B'
...                              ])
>>> finetuning_loss = finetune_model.fit(dataset_ft,nb_epoch=1)
>>> predictions = finetune_model.predict(dataset_ft)
__init__(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict[str, int] | None = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: Literal['regression', 'classification'] = 'classification', **kwargs)[source]
Parameters:
  • img_spec (str, default std) – Image specification used

  • img_size (int, default 80) – Image size used

  • base_filters (int, default 16) – Base filters used for the different inception and reduction layers

  • inception_blocks (dict,) – Dictionary containing number of blocks for every inception layer

  • n_tasks (int, default 10) – Number of classification or regression tasks

  • n_classes (int, default 2) – Number of classes (used only for classification)

  • augment (bool, default False) – Whether to augment images

  • mode (str, default regression) – Whether the model is used for regression or classification

build_components() Dict[str, Module][source]

Build and return the modular components of ChemCeption.

Returns:

components – Dictionary containing the model components: - ‘stem’: initial convolutional stem - ‘inception_resnet_A’: Inception-ResNet-A block stack - ‘reduction_A’: Reduction-A layer - ‘inception_resnet_B’: Inception-ResNet-B block stack - ‘reduction_B’: Reduction-B layer - ‘inception_resnet_C’: Inception-ResNet-C block stack - ‘global_avg_pool’: Global average pooling - ‘output_layer’: Final linear projection

Return type:

dict of str -> nn.Module

build_inception_module(block_cls: Type[Module], block_key: str, *args, **kwargs) Sequential[source]

Build a sequential stack of Inception blocks.

Parameters:
  • block_cls (Type[nn.Module]) – Class of the Inception-ResNet block (A, B, or C).

  • block_key (str) – Key indicating which block type (‘A’, ‘B’, ‘C’).

  • *args – Arguments passed to the block constructor.

  • **kwargs – Arguments passed to the block constructor.

Returns:

Sequential module containing stacked Inception blocks.

Return type:

nn.Sequential

build_model() Module[source]

Assemble ChemCeption model from components.

loss_func(inputs: Tensor | Sequence[Tensor], labels: Sequence, weights: Sequence) Tensor[source]

Compute the weighted mean loss for a batch.

This method forwards inputs through self.model and computes a loss according to self.mode.

Per-sample weights are broadcast (if needed) to match the per-sample loss tensor, multiplied elementwise, and then averaged. If a regularization term is configured, it is added to the final scalar loss.

Parameters:
  • inputs (OneOrMany[torch.Tensor]) – SMILES strings converted to image tensors.

  • labels (Sequence) – Ground-truth targets.

  • weights (Sequence) – Per-sample weights.

Returns:

A scalar tensor: weighted mean loss (+ optional regularization term).

Return type:

torch.Tensor

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

PyTorch Lightning Models

DeepChem supports the use of PyTorch-Lightning to build PyTorch models.

DCLightningModule

You can wrap an arbitrary TorchModel in a DCLightningModule object.

class DCLightningModule(dc_model)[source]

DeepChem Lightning Module to be used with Lightning trainer.

The lightning module is a wrapper over deepchem’s torch model. This module directly works with pytorch lightning trainer which runs training for multiple epochs and also is responsible for setting up and training models on multiple GPUs. https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.core.LightningModule.html?highlight=LightningModule

Examples

Training and prediction workflow with a GCN model:

>>> import deepchem as dc
>>> import lightning as L
>>> from deepchem.models.lightning.dc_lightning_dataset_module import DCLightningDatasetModule
>>> from deepchem.models.lightning.dc_lightning_module import DCLightningModule
>>> from deepchem.feat import MolGraphConvFeaturizer
>>>
>>> # Load and prepare dataset
>>> tasks, dataset, transformers = dc.molnet.load_bace_classification(
...     featurizer=MolGraphConvFeaturizer(), reload=False)
>>>
>>> # Create a GCN model
>>> model = dc.models.GCNModel(
...     mode='classification',
...     n_tasks=len(tasks),
...     number_atom_features=30,
...     batch_size=10,
...     learning_rate=0.0003
... )
>>>
>>> # Setup Lightning modules
>>> data_module = DCLightningDatasetModule(
...     dataset=dataset[0],
...     batch_size=10,
...     model=model
... )
>>> lightning_model = DCLightningModule(dc_model=model)
>>>
>>> # Setup trainer and fit
>>> trainer = L.Trainer(
...     fast_dev_run=True,
...     accelerator="auto",
...     devices="auto",
...     logger=False,
...     enable_checkpointing=True
... )
>>> # trainer.fit(model=lightning_model, datamodule=data_module)
>>>
>>> # Make predictions
>>> # prediction_batches = trainer.predict(model=lightning_model, datamodule=data_module)

Notes

This class requires PyTorch to be installed.

__init__(dc_model)[source]

Create a new DCLightningModule.

Parameters:

dc_model (TorchModel or ModularTorchModel) – TorchModel to be wrapped inside the lightning module.

configure_optimizers()[source]

Configure optimizers and learning rate schedulers.

Returns:

PyTorch optimizer or tuple containing lists of optimizers and schedulers.

Return type:

Union[torch.optim.Optimizer, Tuple[List[torch.optim.Optimizer], List[torch.optim.lr_scheduler.LRScheduler]]]

training_step(batch: Tuple[Any, Any, Any] | Any, batch_idx: int)[source]

Perform a training step.

Parameters:
  • batch (Union[Tuple[Any, Any, Any], Any]) – A tensor, tuple or list containing inputs, labels, and weights.

  • batch_idx (int) – Integer displaying index of this batch

Returns:

The computed loss tensor for this training step.

Return type:

torch.Tensor

predict_step(batch: Tuple[Any, Any, Any], batch_idx: int)[source]

Perform a prediction step with optional support for uncertainty estimates and data transformations.

This method was copied from TorchModel._predict and adapted for Lightning’s predict_step interface.

Changes include: - removed the self.dc_model._prepare_batch call since batch is already prepared.

Parameters:
  • batch (Tuple[Any, Any, Any]) – A tuple containing: - inputs: the input data for prediction, - labels: (unused in prediction, but maintained for consistency), - weights: (unused in prediction).

  • batch_idx (int) – Index of the current batch.

Returns:

Model predictions for this batch. Can be: - numpy array for single output models - list of numpy arrays for multi-output models - zip of (predictions, variances) if uncertainty is enabled

Return type:

Union[np.ndarray, List[np.ndarray], zip]

on_save_checkpoint(checkpoint: dict) None[source]

Called by Lightning when saving a checkpoint.

This method saves only TorchModel-compatible format by removing Lightning-specific state_dict and keeping only the clean model_state_dict.

Parameters:

checkpoint (dict) – The full checkpoint dictionary before it gets dumped to a file. This method modifies this dictionary to use TorchModel-compatible format only.

on_load_checkpoint(checkpoint: dict) None[source]

Called by Lightning to restore the model.

This method creates Lightning-compatible state_dict from TorchModel format by adding the necessary pt_model prefixes.

Parameters:

checkpoint (dict) – Loaded checkpoint dictionary in TorchModel format.

LightningTorchModel

This is the Lightning wrapper for DeepChem that supports training with Fully Sharded Data Parallel (FSDP) and Distributed Data Parallel (DDP). It also performs prediction, evaluation, and checkpoint management for enhanced model training and deployment capabilities. You can wrap an arbitrary TorchModel in a LightningTorchModel object.

class LightningTorchModel(model: TorchModel, batch_size: int = 32, model_dir: str | None = 'default_model_dir', **trainer_kwargs: Any)[source]

A wrapper class that handles the training and inference of DeepChem models using Lightning.

This class provides a high-level interface for training and running inference on DeepChem models using PyTorch Lightning’s training infrastructure. It wraps DeepChem models in Lightning modules and handles data loading, training loops, and checkpoint management. Currently, it supports strategies like DDP (Distributed Data Parallel) and FSDP (Fully Sharded Data Parallel) for distributed training, as well as single-device training.

Important: For multi-GPU strategies (DDP, FSDP), this class must be used in a script and cannot be run in Jupyter notebooks or interactive environments due to Lightning’s multiprocessing requirements.

__init__(model: TorchModel, batch_size: int = 32, model_dir: str | None = 'default_model_dir', **trainer_kwargs: Any) None[source]

Initialize the LightningTorchModel.

Parameters:
  • model (TorchModel) – Initialized DeepChem model to be trained or used for inference.

  • batch_size (int, default 32) – Batch size for training and prediction data loaders.

  • model_dir (str, default "default_model_dir") – Path to directory where model and checkpoints will be stored. If not specified, model will be stored in a “default_model_dir” directory. This is compatible with DeepChem’s model directory structure. If None, checkpointing will be disabled.

  • **trainer_kwargs

    Additional keyword arguments passed to the Lightning Trainer. Common options include:

    • accelerator: str, default “auto”

      Hardware accelerator to use (“cpu”, “gpu”, “tpu”, “auto”).

    • devices: int or str or list, default “auto”

      Number of devices/GPUs to use.

    • strategy: str, default “auto”

      Distributed training strategy (“ddp”, “fsdp”, “auto”).

    • precision: str or int, default “32-true”

      Numerical precision (“16-mixed”, “bf16-mixed”, “32-true”).

    • log_every_n_steps: int, default 50

      How often to log within training steps.

    • enable_checkpointing: bool, default True

      Whether to enable automatic checkpointing.

    • fast_dev_run: bool or int, default False

      Run a fast development run with limited batches, epochs and no checkpointing for debugging.

    For all available options, see: https://lightning.ai/docs/pytorch/stable/common/trainer.html#init

Examples

>>> import deepchem as dc
>>> from deepchem.models.lightning.trainer import LightningTorchModel
>>> tasks, datasets, _ = dc.molnet.load_clintox()
>>> _, valid_dataset, _ = datasets
>>> model = dc.models.MultitaskClassifier(
...     n_tasks=len(tasks),
...     n_features=1024,
...     layer_sizes=[1000],
...     dropouts=0.2,
...     learning_rate=0.0001,
...     device="cpu",
...     batch_size=16
... )
>>> trainer = LightningTorchModel(
...     model=model,
...     batch_size=16,
...     accelerator="cpu",
...     log_every_n_steps=1,
...     fast_dev_run=True
... )
>>> # Train with custom checkpoint settings
>>> # trainer.fit(valid_dataset, nb_epoch=3)
>>> # predictions = trainer.predict(valid_dataset)
>>> # To restore from checkpoint:
>>> # trainer.restore()
fit(train_dataset: Dataset, nb_epoch: int = 1, restore: bool = False, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, num_workers: int = 4, ckpt_path: str | None = None)[source]

Train the model on the provided dataset.

Parameters:
  • train_dataset (dc.data.Dataset) – DeepChem dataset for training.

  • nb_epoch (int, default 1) – Maximum number of epochs to train the model for. Note, nb_epoch is mapped to max_epochs in Lightning Trainer.

  • restore (bool, default False) – Whether to restore from a previous checkpoint. If True, will load the model weights from the specified ckpt_path if provided. If restore is True and ckpt_path is None, it will look for the last checkpoint in the model_dir under “checkpoints/last.ckpt”.

  • max_checkpoints_to_keep (int, default 5) – The maximum number of checkpoints to keep.

  • checkpoint_interval (int, default 1000) – The frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • num_workers (int, default 4) – Number of workers for DataLoader.

  • ckpt_path (Optional[str], default None) – Path to a checkpoint file to resume training from. If None, starts fresh.

Notes

If max_checkpoints_to_keep is set to n, the trainer will keep the last n checkpoints plus the last checkpoint created when the fit ends successfully, named last.ckpt.

predict(dataset: Dataset, transformers: List[Transformer] = [], other_output_types: str | Sequence[str] | None = None, num_workers: int = 0, uncertainty: bool | None = None, ckpt_path: str | None = None)[source]

Run inference on the provided dataset.

Parameters:
  • dataset (dc.data.Dataset) – DeepChem dataset for prediction.

  • transformers (List[Transformer], default []) – List of transformers to apply to predictions.

  • other_output_types (Optional[OneOrMany[str]], default None) – List of other output types to compute.

  • num_workers (int, default 4) – Number of workers for DataLoader.

  • uncertainty (Optional[bool], default None) – Whether to compute uncertainty estimates.

  • ckpt_path (Optional[str], default None) – Path to a checkpoint file to load model weights from.

Returns:

Predictions from the model.

Return type:

List

save_checkpoint(max_checkpoints_to_keep: int = 1, model_dir: str | None = None) None[source]

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

This method maintains compatibility with TorchModel’s save_checkpoint interface while using Lightning’s native checkpointing mechanism.

Parameters:
  • max_checkpoints_to_keep (int, default 1) – The maximum number of checkpoints to keep. Older checkpoints are discarded.

  • model_dir (str, default None) – Model directory to save checkpoint to. If None, reverts to self.model_dir. Checkpoints will be saved in a ‘checkpoints’ subdirectory within this path.

Notes

The max_checkpoints_to_keep parameter greater than 1 does not play any significant role here, since we use modelcheckpoint callbacks from lightning for dynamic checkpoint saving. It is kept with the same name and type just to follow the deepchem’s convention.

restore(checkpoint: str | None = None, model_dir: str | None = None, strict: bool | None = True) None[source]

Reload the values of all variables from a checkpoint file.

This method maintains compatibility with TorchModel’s restore interface while using Lightning’s native checkpointing mechanism.

Parameters:
  • checkpoint (str, optional) – the path to the checkpoint file to load. If this is None, will look for ‘last.ckpt’ in the model_dir/checkpoints/ directory.

  • model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If checkpoint is not None, this is ignored.

  • strict (bool, default True) – Whether to strictly enforce that the keys in the checkpoint, match the keys returned by this module’s state dict.

Notes

Important Note for FSDP Users: When using FSDP (Fully Sharded Data Parallel) training strategy, restoring weights on the same trainer instance after fitting, for prediction, can cause shape-mismatch errors due to how FSDP handles model sharding. It is strongly recommended to create a new LightningTorchModel instance instead of calling restore() on an existing trained instance when using FSDP.

Jax Models

DeepChem supports the use of Jax to build deep learning models.

JaxModel

class JaxModel(forward_fn: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], params: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], loss: ~deepchem.models.losses.Loss | ~typing.Callable[[~typing.List, ~typing.List, ~typing.List], ~typing.Any] | None, output_types: ~typing.List[str] | None = None, batch_size: int = 100, learning_rate: float = 0.001, optimizer: ~optax._src.base.GradientTransformation | ~deepchem.models.optimizers.Optimizer | None = None, grad_fn: ~typing.Callable = <function create_default_gradient_fn>, update_fn: ~typing.Callable = <function create_default_update_fn>, eval_fn: ~typing.Callable = <function create_default_eval_fn>, rng=Array([0, 1], dtype=uint32), log_frequency: int = 100, **kwargs)[source]

This is a DeepChem model implemented by a Jax Model Here is a simple example of that uses JaxModel to train a Haiku (JAX Neural Network Library) based model on deepchem dataset.

>>>
>> def forward_model(x):
>>   net = hk.nets.MLP([512, 256, 128, 1])
>>   return net(x)
>> def rms_loss(pred, tar, w):
>>   return jnp.mean(optax.l2_loss(pred, tar))
>> params_init, forward_fn = hk.transform(forward_model)
>> rng = jax.random.PRNGKey(500)
>> inputs, _, _, _ = next(iter(dataset.iterbatches(batch_size=256)))
>> params = params_init(rng, inputs)
>> j_m = JaxModel(forward_fn, params, rms_loss, 256, 0.001, 100)
>> j_m.fit(train_dataset)

All optimizations will be done using the optax library.

__init__(forward_fn: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], params: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], loss: ~deepchem.models.losses.Loss | ~typing.Callable[[~typing.List, ~typing.List, ~typing.List], ~typing.Any] | None, output_types: ~typing.List[str] | None = None, batch_size: int = 100, learning_rate: float = 0.001, optimizer: ~optax._src.base.GradientTransformation | ~deepchem.models.optimizers.Optimizer | None = None, grad_fn: ~typing.Callable = <function create_default_gradient_fn>, update_fn: ~typing.Callable = <function create_default_update_fn>, eval_fn: ~typing.Callable = <function create_default_eval_fn>, rng=Array([0, 1], dtype=uint32), log_frequency: int = 100, **kwargs)[source]

Create a new JaxModel

Parameters:
  • model (hk.State or Function) – Any Jax based model that has a apply method for computing the network. Currently only haiku models are supported.

  • params (hk.Params) – The parameter of the Jax based networks

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (optax object) – For the time being, it is optax object

  • rng (jax.random.PRNGKey, optional (default 1)) – A default global PRNG key to use for drawing random numbers.

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default.

Miscellanous Parameters Yet To Add

model_dir: str, optional (default None)

Will be added along with the save & load method

tensorboard: bool, optional (default False)

whether to log progress to TensorBoard during training

wandb: bool, optional (default False)

whether to log progress to Weights & Biases during training

Work in Progress

[1] Integrate the optax losses, optimizers, schedulers with Deepchem [2] Support for saving & loading the model.

fit(dataset: Dataset, nb_epochs: int = 10, deterministic: bool = False, loss: Loss | Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on a dataset. :param dataset: the Dataset to train on :type dataset: Dataset :param nb_epoch: the number of epochs to train for :type nb_epoch: int :param deterministic: if True, the samples are processed in order. If False, a different random

order is used for each epoch.

Parameters:
  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Returns:

  • The average loss over the most recent checkpoint interval

  • Miscellanous Parameters Yet To Add

  • ———————————-

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of hk.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.

  • Work in Progress

  • —————-

  • [1] Integerate the optax losses, optimizers, schedulers with Deepchem

  • [2] Support for saving & loading the model.

  • [3] Adding support for output types (choosing only self._loss_outputs)

predict_on_generator(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[Transformer] = [], output_types: str | Sequence[str] | None = None) ndarray | Sequence[ndarray][source]
Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

predict_on_batch(X: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], transformers: List[Transformer] = []) ndarray | Sequence[ndarray][source]

Generates predictions for input samples, processing samples in a batch. :param X: the input data, as a Numpy array. :type X: ndarray :param transformers: Transformers that the input data has been transformed by. The output

is passed through these transformers to undo the transformations.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

predict(dataset: Dataset, transformers: List[Transformer] = [], output_types: List[str] | None = None) ndarray | Sequence[ndarray][source]

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on

  • transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.

Returns:

  • a NumPy array of the model produces a single output, or a list of arrays

  • if it produces multiple outputs

get_global_step() int[source]

Get the number of steps of fitting that have been performed.

evaluate_generator(generator: Iterable[Tuple[Any, Any, Any]], metrics: List[Metric], transformers: List[Transformer] = [], per_task_metrics: bool = False)[source]

Evaluate the performance of this model on the data produced by a generator. :param generator: this should generate batches, each represented as a tuple of the form

(inputs, labels, weights).

Parameters:
  • metric (list of deepchem.metrics.Metric) – Evaluation metric

  • transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

  • per_task_metrics (bool) – If True, return per-task scores.

Returns:

Maps tasks to scores under metric.

Return type:

dict

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset. Subclasses may override this method to customize how model inputs are generated from the data. :param dataset: the data to iterate :type dataset: Dataset :param epochs: the number of times to iterate over the full dataset :type epochs: int :param mode: allowed values are ‘fit’ (called during training), ‘predict’ (called

during prediction), and ‘uncertainty’ (called during uncertainty prediction)

Parameters:
  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

PinnModel

class PINNModel(forward_fn: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], params: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], initial_data: dict = {}, output_types: ~typing.List[str] | None = None, batch_size: int = 100, learning_rate: float = 0.001, optimizer: ~optax._src.base.GradientTransformation | ~deepchem.models.optimizers.Optimizer | None = None, grad_fn: ~typing.Callable = <function create_default_gradient_fn>, update_fn: ~typing.Callable = <function create_default_update_fn>, eval_fn: ~typing.Callable = <function create_default_eval_fn>, rng=Array([0, 1], dtype=uint32), log_frequency: int = 100, **kwargs)[source]

This is class is derived from the JaxModel class and methods are also very similar to JaxModel, but it has the option of passing multiple arguments(Done using *args) suitable for PINNs model. Ex - Approximating f(x, y, z, t) satisfying a Linear differential equation.

This model is recommended for linear partial differential equations but if you can accurately write the gradient function in Jax depending on your use case, then it will work as well.

This class requires two functions apart from the usual function definition and weights

[1] grad_fn : Each PINNs have a different strategy for calculating its final losses. This function tells the PINNModel how to go about computing the derivatives for backpropagation. It should follow this format:

>>>
>> def gradient_fn(forward_fn, loss_outputs, initial_data):
>>
>>  def model_loss(params, target, weights, rng, ...):
>>
>>    # write code using the arguments.
>>    # ... indicates the variable number of positional arguments.
>>    return
>>
>>  return model_loss

“…” can be replaced with various arguments like (x, y, z, y) but should match with eval_fn

[2] eval_fn: Function for defining how the model needs to compute during inference. It should follow this format

>>>
>> def create_eval_fn(forward_fn, params):
>>  def eval_model(..., rng=None):
>>    # write code here using arguments
>>
>>    return
>>  return eval_model

“…” can be replaced with various arguments like (x, y, z, y) but should match with grad_fn

[3] boundary_data: For a detailed example, check out - deepchem/models/jax_models/tests/test_pinn.py where we have solved f’(x) = -sin(x)

References

Notes

This class requires Jax, Haiku and Optax to be installed.

__init__(forward_fn: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], params: ~collections.abc.Mapping[str, ~collections.abc.Mapping[str, ~jax.Array]], initial_data: dict = {}, output_types: ~typing.List[str] | None = None, batch_size: int = 100, learning_rate: float = 0.001, optimizer: ~optax._src.base.GradientTransformation | ~deepchem.models.optimizers.Optimizer | None = None, grad_fn: ~typing.Callable = <function create_default_gradient_fn>, update_fn: ~typing.Callable = <function create_default_update_fn>, eval_fn: ~typing.Callable = <function create_default_eval_fn>, rng=Array([0, 1], dtype=uint32), log_frequency: int = 100, **kwargs)[source]
Parameters:
  • forward_fn (hk.State or Function) – Any Jax based model that has a apply method for computing the network. Currently only haiku models are supported.

  • params (hk.Params) – The parameter of the Jax based networks

  • initial_data (dict) – This acts as a session variable which will be passed as a dictionary in grad_fn

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (optax object) – For the time being, it is optax object

  • grad_fn (Callable (default create_default_gradient_fn)) – It defines how the loss function and gradients need to be calculated for the PINNs model

  • update_fn (Callable (default create_default_update_fn)) – It defines how the weights need to be updated using backpropogation. We have used optax library for optimisation operations. Its reccomended to leave this default.

  • eval_fn (Callable (default create_default_eval_fn)) – Function for defining on how the model needs to compute during inference.

  • rng (jax.random.PRNGKey, optional (default 1)) – A default global PRNG key to use for drawing random numbers.

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default.

default_generator(dataset: Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) Iterable[Tuple[List, List, List]][source]

Create a generator that iterates batches for a dataset. Subclasses may override this method to customize how model inputs are generated from the data.

Parameters:
  • dataset (Dataset) – the data to iterate

  • epochs (int) – the number of times to iterate over the full dataset

  • mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)

  • deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch

  • pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns:

  • a generator that iterates batches, each represented as a tuple of lists

  • ([inputs], [outputs], [weights])

Hugging Face Models

HuggingFace models from the transformers library can wrapped using the wrapper HuggingFaceModel

class HuggingFaceModel(model: PreTrainedModel, tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, task: str | None = None, config: Dict | None = None, **kwargs)[source]

Wrapper class that wraps HuggingFace models as DeepChem models

The class provides a wrapper for wrapping models from HuggingFace ecosystem in DeepChem and training it via DeepChem’s API. The reason for this might be that you might want to do an apples-to-apples comparison between HuggingFace from the transformers library and DeepChem library.

The HuggingFaceModel has a Has-A relationship by wrapping models from transformers library. Once a model is wrapped, DeepChem’s API are used for training, prediction, evaluation and other downstream tasks.

A HuggingFaceModel wrapper also has a tokenizer which tokenizes raw SMILES strings into tokens to be used by downstream models. The SMILES strings are generally stored in the X attribute of deepchem.data.Dataset object’. This differs from the DeepChem standard workflow as tokenization is done on the fly here. The approach allows us to leverage transformers library’s fast tokenization algorithms and other utilities like data collation, random masking of tokens for masked language model training etc.

Parameters:
  • model (transformers.modeling_utils.PreTrainedModel) – The HuggingFace model to wrap.

  • task (str, (optional, default None)) –

    The task defines the type of learning task in the model. The supported tasks are
    • mlm - masked language modeling commonly used in pretraining

    • mtr - multitask regression - a task used for both pretraining base models and finetuning

    • regression - use it for regression tasks, like property prediction

    • classification - use it for classification tasks

    When the task is not specified or None, the wrapper returns raw output of the HuggingFaceModel. In cases where the HuggingFaceModel is a model without a task specific head, this output will be the last hidden states.

  • tokenizer (transformers.tokenization_utils.PreTrainedTokenizer) – Tokenizer

  • config (dict, (optional, default None)) – A dictionary of model configuration parameters that will be passed to the Hugging Face AutoModel classes via **kwargs when loading from the hf_checkpoint. These parameters are typically used to customize the behavior and architecture of the underlying transformer model (e.g., number of layers, hidden size, dropout rates, etc.). When loading from pretrained from hf_checkpoint, if any keys in config match configuration attributes supported by the specific Hugging Face AutoModel being used, they will override the default settings for that model.

Example

>>> import os
>>> import tempfile
>>> import shutil
>>> tempdir = tempfile.mkdtemp()
>>> # preparing dataset
>>> smiles = ['CN(c1ccccc1)c1ccccc1C(=O)NCC1(O)CCOCC1', 'CC[NH+](CC)C1CCC([NH2+]C2CC2)(C(=O)[O-])C1', \
...     'COCC(CNC(=O)c1ccc2c(c1)NC(=O)C2)OC', 'OCCn1cc(CNc2cccc3c2CCCC3)nn1', \
...     'CCCCCCc1ccc(C#Cc2ccc(C#CC3=CC=C(CCC)CC3)c(C3CCCCC3)c2)c(F)c1', 'nO=C(NCc1ccc(F)cc1)N1CC=C(c2c[nH]c3ccccc23)CC1']
>>> filepath = os.path.join(tempdir, 'smiles.txt')
>>> f = open(filepath, 'w')
>>> f.write('\n'.join(smiles))
253
>>> f.close()
>>> # preparing tokenizer
>>> from tokenizers import ByteLevelBPETokenizer
>>> from transformers.models.roberta import RobertaTokenizerFast
>>> tokenizer = ByteLevelBPETokenizer()
>>> tokenizer.train(files=filepath, vocab_size=1_000, min_frequency=2, special_tokens=["<s>", "<pad>", "</s>", "<unk>", "<mask>"])
>>> tokenizer_path = os.path.join(tempdir, 'tokenizer')
>>> os.makedirs(tokenizer_path)
>>> result = tokenizer.save_model(tokenizer_path)
>>> tokenizer = RobertaTokenizerFast.from_pretrained(tokenizer_path)
>>> # preparing dataset
>>> import pandas as pd
>>> import deepchem as dc
>>> smiles = ["CCN(CCSC)C(=O)N[C@@](C)(CC)C(F)(F)F","CC1(C)CN(C(=O)Nc2cc3ccccc3nn2)C[C@@]2(CCOC2)O1"]
>>> labels = [3.112,2.432]
>>> df = pd.DataFrame(list(zip(smiles, labels)), columns=["smiles", "task1"])
>>> with dc.utils.UniversalNamedTemporaryFile(mode='w') as tmpfile:
...     df.to_csv(tmpfile.name)
...     loader = dc.data.CSVLoader(["task1"], feature_field="smiles", featurizer=dc.feat.DummyFeaturizer())
...     dataset = loader.create_dataset(tmpfile.name)
>>> # pretraining
>>> from deepchem.models.torch_models.hf_models import HuggingFaceModel
>>> from transformers.models.roberta import RobertaForMaskedLM, RobertaModel, RobertaConfig
>>> config = RobertaConfig(vocab_size=tokenizer.vocab_size)
>>> model = RobertaForMaskedLM(config)
>>> hf_model = HuggingFaceModel(model=model, tokenizer=tokenizer, task='mlm', model_dir='model-dir')
>>> training_loss = hf_model.fit(dataset, nb_epoch=1)
>>> # finetuning a regression model
>>> from transformers.models.roberta import RobertaForSequenceClassification
>>> config = RobertaConfig(vocab_size=tokenizer.vocab_size, problem_type='regression', num_labels=1)
>>> model = RobertaForSequenceClassification(config)
>>> hf_model = HuggingFaceModel(model=model, tokenizer=tokenizer, task='regression', model_dir='model-dir')
>>> hf_model.load_from_pretrained()
>>> training_loss = hf_model.fit(dataset, nb_epoch=1)
>>> prediction = hf_model.predict(dataset)  # prediction
>>> eval_results = hf_model.evaluate(dataset, metrics=dc.metrics.Metric(dc.metrics.mae_score))
>>> # finetune a classification model
>>> # making dataset suitable for classification
>>> import numpy as np
>>> y = np.random.choice([0, 1], size=dataset.y.shape)
>>> dataset = dc.data.NumpyDataset(X=dataset.X, y=y, w=dataset.w, ids=dataset.ids)
>>> from transformers import RobertaForSequenceClassification
>>> config = RobertaConfig(vocab_size=tokenizer.vocab_size)
>>> model = RobertaForSequenceClassification(config)
>>> hf_model = HuggingFaceModel(model=model, task='classification', tokenizer=tokenizer)
>>> training_loss = hf_model.fit(dataset, nb_epoch=1)
>>> predictions = hf_model.predict(dataset)
>>> eval_result = hf_model.evaluate(dataset, metrics=dc.metrics.Metric(dc.metrics.f1_score))
>>> # removing temporary directory
>>> if os.path.exists(tempdir):
...     shutil.rmtree(tempdir)
__init__(model: PreTrainedModel, tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, task: str | None = None, config: Dict | None = None, **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

load_from_pretrained(model_dir: str | None = None, from_hf_checkpoint: bool = False)[source]

Load HuggingFace model from a pretrained checkpoint.

The utility can be used for loading a model from a checkpoint. Given model_dir, it checks for existing checkpoint in the directory. If a checkpoint exists, the models state is loaded from the checkpoint.

If the option from_hf_checkpoint is set as True, then it loads a pretrained model using HuggingFace models from_pretrained method. This option interprets model_dir as a model id of a pretrained model hosted inside a model repo on huggingface.co or path to directory containing model weights saved using save_pretrained method of a HuggingFace model.

Parameter

model_dir: str

Directory containing model checkpoint

from_hf_checkpoint: bool, default False

Loads a pretrained model from HuggingFace checkpoint.

Example

>>> from transformers import RobertaTokenizerFast
>>> tokenizer = RobertaTokenizerFast.from_pretrained("seyonec/PubChem10M_SMILES_BPE_60k")
>>> from deepchem.models.torch_models.hf_models import HuggingFaceModel
>>> from transformers.models.roberta import RobertaForMaskedLM, RobertaModel, RobertaConfig
>>> config = RobertaConfig(vocab_size=tokenizer.vocab_size)
>>> model = RobertaForMaskedLM(config)
>>> pretrain_model = HuggingFaceModel(model=model, tokenizer=tokenizer, task='mlm', model_dir='model-dir')
>>> pretrain_model.save_checkpoint()
>>> from transformers import RobertaForSequenceClassification
>>> config = RobertaConfig(vocab_size=tokenizer.vocab_size)
>>> model = RobertaForSequenceClassification(config)
>>> finetune_model = HuggingFaceModel(model=model, task='classification', tokenizer=tokenizer, model_dir='model-dir')
>>> finetune_model.load_from_pretrained()

Note

  1. Use load_from_pretrained method only to load a pretrained model - a

    model trained on a different task like Masked Language Modeling or Multitask Regression. To restore a model, use the restore method.

  2. A pretrain model has different number of target tasks for pretraining and a finetune

    model has different number of target tasks for finetuning. Thus, they both have different number of projection outputs in the last layer. To avoid a mismatch in the weights of the output projection layer (last layer) between the pretrain model and current model, we delete the projection layers weights.

fit_generator(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: List[Parameter] | ParameterList | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on data from a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step, **kwargs) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Return type:

The average loss over the most recent checkpoint interval

Note

A HuggingFace model can return embeddings (last hidden state), attentions. Support must be added to return the embeddings to the user, so that it can be used for other downstream applications.

fill_mask(inputs: str | List[str], top_k: int = 5) List[Dict] | List[List[Dict]][source]

Implements the HuggingFace ‘fill_mask’ pipeline from HuggingFace. https://huggingface.co/docs/transformers/main_classes/pipelines

Takes as input a sequence or list of sequences where each sequence containts a single masked position and returns a list of dictionaries per sequence containing the filled sequence, the token, and the score for that token.

Parameters:
  • inputs (Union[str, List[str]]) – One or several texts (or one list of texts) with masked tokens.

  • top_k (int, optional) – The number of predictions to return for each mask. Default is 5.

Returns:

A list or a list of list of dictionaries with the following keys: - sequence (str): The corresponding input with the mask token prediction. - score (float): The corresponding probability. - token (int): The predicted token id (to replace the masked one). - token_str (str): The predicted token (to replace the masked one)

Return type:

Union[List[Dict], List[List[Dict]]]


class Chemberta(task: str, tokenizer_path: str = 'seyonec/PubChem10M_SMILES_BPE_60k', n_tasks: int = 1, config: Dict[Any, Any] = {}, **kwargs)[source]

Chemberta Model

Chemberta is a transformer style model for learning on SMILES strings. The model architecture is based on the RoBERTa architecture. The model has can be used for both pretraining an embedding and finetuning for downstream applications.

The model supports two types of pretraining tasks - pretraining via masked language modeling and pretraining via multi-task regression. To pretrain via masked language modeling task, use task = mlm and for pretraining via multitask regression task, use task = mtr. The model supports the regression, classification and multitask regression finetuning tasks and they can be specified using regression, classification and mtr as arguments to the task keyword during model initialisation.

The model uses a tokenizer To create input tokens for the models from the SMILES strings. The default tokenizer model is a byte-pair encoding tokenizer trained on PubChem10M dataset and loaded from huggingFace model hub (https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_60k).

Parameters:
  • task (str) –

    The task defines the type of learning task in the model. The supported tasks are
    • mlm - masked language modeling commonly used in pretraining

    • mtr - multitask regression - a task used for both pretraining base models and finetuning

    • regression - use it for regression tasks, like property prediction

    • classification - use it for classification tasks

  • tokenizer_path (str) – Path containing pretrained tokenizer used to tokenize SMILES string for model inputs. The tokenizer path can either be a huggingFace tokenizer model or a path in the local machine containing the tokenizer.

  • n_tasks (int, default 1) – Number of prediction targets for a multitask learning model

Example

>>> import os
>>> import tempfile
>>> import shutil
>>> tempdir = tempfile.mkdtemp()
>>> # preparing dataset
>>> import pandas as pd
>>> import deepchem as dc
>>> smiles = ["CCN(CCSC)C(=O)N[C@@](C)(CC)C(F)(F)F","CC1(C)CN(C(=O)Nc2cc3ccccc3nn2)C[C@@]2(CCOC2)O1"]
>>> labels = [3.112,2.432]
>>> df = pd.DataFrame(list(zip(smiles, labels)), columns=["smiles", "task1"])
>>> with dc.utils.UniversalNamedTemporaryFile(mode='w') as tmpfile:
...     df.to_csv(tmpfile.name)
...     loader = dc.data.CSVLoader(["task1"], feature_field="smiles", featurizer=dc.feat.DummyFeaturizer())
...     dataset = loader.create_dataset(tmpfile.name)
>>> # pretraining
>>> from deepchem.models.torch_models.chemberta import Chemberta
>>> pretrain_model_dir = os.path.join(tempdir, 'pretrain-model')
>>> tokenizer_path = "seyonec/PubChem10M_SMILES_BPE_60k"
>>> pretrain_model = Chemberta(task='mlm', model_dir=pretrain_model_dir, tokenizer_path=tokenizer_path)  # mlm pretraining
>>> pretraining_loss = pretrain_model.fit(dataset, nb_epoch=1)
>>> # finetuning in regression mode
>>> finetune_model_dir = os.path.join(tempdir, 'finetune-model')
>>> finetune_model = Chemberta(task='regression', model_dir=finetune_model_dir, tokenizer_path=tokenizer_path)
>>> finetune_model.load_from_pretrained(pretrain_model_dir)
>>> finetuning_loss = finetune_model.fit(dataset, nb_epoch=1)
>>> # prediction and evaluation
>>> result = finetune_model.predict(dataset)
>>> eval_results = finetune_model.evaluate(dataset, metrics=dc.metrics.Metric(dc.metrics.mae_score))
>>> # removing temporary directory
>>> if os.path.exists(tempdir):
...     shutil.rmtree(tempdir)

Reference

__init__(task: str, tokenizer_path: str = 'seyonec/PubChem10M_SMILES_BPE_60k', n_tasks: int = 1, config: Dict[Any, Any] = {}, **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

MoLFormer

class MoLFormer(task: str, tokenizer_path: str = 'ibm/MoLFormer-XL-both-10pct', n_tasks: int = 1, **kwargs)[source]

MoLFormer is a large-scale chemical language model designed with the intention of learning a model trained on small molecules which are represented as SMILES strings. MoLFormer leverges masked language modeling and employs a linear attention Transformer combined with rotary embeddings.

MoLFormer-XL-both-10pct is the version trained on 10% ZINC + 10% PubChem.

Parameters:
  • task (str) –

    The task defines the type of learning task in the model. The supported tasks are
    • mlm - masked language modeling commonly used in pretraining

    • mtr - multitask regression - a task used for both pretraining base models and finetuning

    • regression - use it for regression tasks, like property prediction

    • classification - use it for classification tasks

  • tokenizer_path (str) – Path containing pretrained tokenizer used to tokenize SMILES string for model inputs. The tokenizer path can either be a huggingFace tokenizer model or a path in the local machine containing the tokenizer.

  • n_tasks (int, default 1) – Number of prediction targets for a multitask learning model

Example

>>> import os
>>> import tempfile
>>> import shutil
>>> tempdir = tempfile.mkdtemp()
>>> # preparing dataset
>>> import pandas as pd
>>> import deepchem as dc
>>> smiles = ["CCN(CCSC)C(=O)N[C@@](C)(CC)C(F)(F)F","CC1(C)CN(C(=O)Nc2cc3ccccc3nn2)C[C@@]2(CCOC2)O1"]
>>> labels = [3.112,2.432]
>>> df = pd.DataFrame(list(zip(smiles, labels)), columns=["smiles", "task1"])
>>> with dc.utils.UniversalNamedTemporaryFile(mode='w') as tmpfile:
...     df.to_csv(tmpfile.name)
...     loader = dc.data.CSVLoader(["task1"], feature_field="smiles", featurizer=dc.feat.DummyFeaturizer())
...     dataset = loader.create_dataset(tmpfile.name)
>>> # pretraining
>>> from deepchem.models.torch_models.molformer import MoLFormer
>>> pretrain_model_dir = os.path.join(tempdir, 'pretrain-molformer-model')
>>> tokenizer_path = "ibm/MoLFormer-XL-both-10pct"
>>> pretrain_model = MoLFormer(task='mlm', model_dir=pretrain_model_dir, tokenizer_path=tokenizer_path)  # mlm pretraining
>>> pretraining_loss = pretrain_model.fit(dataset, nb_epoch=1)
>>> # finetuning in regression mode
>>> finetune_model_dir = os.path.join(tempdir, 'finetune-model')
>>> finetune_model = MoLFormer(task='regression', model_dir=finetune_model_dir, tokenizer_path=tokenizer_path)
>>> finetune_model.load_from_pretrained(pretrain_model_dir)
>>> finetuning_loss = finetune_model.fit(dataset, nb_epoch=1)
>>> # prediction and evaluation
>>> result = finetune_model.predict(dataset)
>>> eval_results = finetune_model.evaluate(dataset, metrics=dc.metrics.Metric(dc.metrics.mean_absolute_error))
>>> # removing temporary directory
>>> if os.path.exists(tempdir):
...     shutil.rmtree(tempdir)

Reference

__init__(task: str, tokenizer_path: str = 'ibm/MoLFormer-XL-both-10pct', n_tasks: int = 1, **kwargs)[source]

Create a new TorchModel.

Parameters:
  • model (torch.nn.Module) – the PyTorch model implementing the calculation

  • loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above

  • output_types (list of strings, optional (default None)) – the type of each output from the model, as described above

  • batch_size (int, optional (default 100)) – default batch size for training and evaluating

  • model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.

  • learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.

  • optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.

  • tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training

  • wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training

  • log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

  • device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.

  • regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

  • wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics

ProtBERT

class ProtBERT(task: str, model_path: str = 'Rostlab/prot_bert', n_tasks: int = 1, cls_name: str = 'LogReg', classifier_net: Module | None = None, n_classes: int = 2, **kwargs)[source]

ProtBERT model[1].

ProtBERT model is based on BERT architecture and the current implementation supports only MLM pretraining and classification mode, as described by the authors in HuggingFace[2]. For classfication we currently only support Logistic regression and a simple Feed forward neural network.

The model converts the input protein sequence into a vector through a trained BERT tokenizer, which is then processed by the corresponding model based on the task. BertForMaskedLM is used to facilitate the MLM pretraining task. For the sequence classification task, we follow BertForSequenceClassification but change the classifier to either a logistic regression (LogReg) or a feed-forward neural network (FFN), depending on the specified cls_name. The FFN is a simple 2-layer network with 512 as the hidden dimension.

Examples

>>> import os
>>> import tempfile
>>> import shutil
>>> tempdir = tempfile.mkdtemp()
>>> # preparing dataset
>>> import pandas as pd
>>> import deepchem as dc
>>> protein = ["MPCTTYLPLLLLLFLLPPPSVQSKV","SSGLFWMELLTQFVLTWPLVVIAFL"]
>>> labels = [0,1]
>>> df = pd.DataFrame(list(zip(protein, labels)), columns=["protein", "task1"])
>>> with dc.utils.UniversalNamedTemporaryFile(mode='w') as tmpfile:
...    df.to_csv(tmpfile.name)
...    loader = dc.data.CSVLoader(["task1"], feature_field="protein", featurizer=dc.feat.DummyFeaturizer())
...    dataset = loader.create_dataset(tmpfile.name)
>>> # pretraining
>>> from deepchem.models.torch_models.prot_bert import ProtBERT
>>> pretrain_model_dir = os.path.join(tempdir, 'pretrain-model')
>>> model_path = 'Rostlab/prot_bert'
>>> pretrain_model = ProtBERT(task='mlm', HG_model_path=model_path, n_tasks=1, model_dir=pretrain_model_dir)  # mlm pretraining
>>> pretraining_loss = pretrain_model.fit(dataset, nb_epoch=1)
>>> del pretrain_model
>>> finetune_model_dir = os.path.join(tempdir, 'finetune-model')
>>> finetune_model = ProtBERT(task='classification', HG_model_path=model_path, n_tasks=1, model_dir=finetune_model_dir)
>>> finetune_model.load_from_pretrained(pretrain_model_dir)
>>> finetuning_loss = finetune_model.fit(dataset, nb_epoch=1)
>>> # prediction and evaluation
>>> result = finetune_model.predict(dataset)
>>> eval_results = finetune_model.evaluate(dataset, metrics=dc.metrics.Metric(dc.metrics.accuracy_score))
>>> feat_extractor_model = ProtBERT(task='feature_extractor', HG_model_path=model_path, n_tasks=1, device = 'cpu')
>>> protein = "M G L P V S W A P P A L W V L G C C A L L L S L W A"
>>> tokenized_data = feat_extractor_model.tokenizer(protein,return_tensors='pt')
>>> protbert_feats = feat_extractor_model.get_last_hidden_state(tokenized_data['input_ids'],tokenized_data['attention_mask'])
>>> # removing temporary directory
>>> if os.path.exists(tempdir):
...      shutil.rmtree(tempdir)

References

__init__(task: str, model_path: str = 'Rostlab/prot_bert', n_tasks: int = 1, cls_name: str = 'LogReg', classifier_net: Module | None = None, n_classes: int = 2, **kwargs) None[source]
Parameters:
  • task (str) – The task defines the type of learning task in the model. The supported tasks are - mlm - masked language modeling commonly used in pretraining - classification - use it for classification tasks - feature_extractor - use it along side the predict_embedding() method to extract features from a protein sequence

  • model_path (str) – Path to the HuggingFace model - ‘Rostlab/prot_bert’ - Pretrained on Uniref100 dataset - Rostlab/prot_bert_bfd - Pretrained on BFD dataset

  • n_tasks (int) – Number of prediction targets for a multitask learning model

  • cls_name (str) – The classifier head to use for classification mode. Currently supports “FFN” and “LogReg” and custom classfier head.

  • classifier_net (nn.Module, optional) – A custom classifier head to use for classification mode. The network must have input size of 1024.

  • n_classes (int) – Number of classes for classification.

get_last_hidden_state(input_ids: Tensor, attention_mask: Tensor) Tensor[source]

Extracts the last hidden state from the model output.

Parameters:
  • input_ids (torch.Tensor) – Tensor containing tokenized input sequences.

  • attention_mask (torch.Tensor) – Tensor indicating which positions should be attended to.

Returns:

The last hidden state of the model.

Return type:

torch.Tensor

DeepAbLLM

class DeepAbLLM(task: str = 'mlm', model_path: str = 'Rostlab/prot_bert', n_tasks: int = 1, is_esm_variant: bool = False, config: Dict[Any, Any] = {}, **kwargs)[source]

Flexible Antibody Language Model for Re-Design of Ab Residues.

DeepAbLLM is a wrapper class that leverage large language model’s (LLMs) learned sequence co-dependencies to aid in the (re)-designing anitbody sequences. It supports the instantiation of an arbitrary HuggingFace transformer-style model trained on Antibody sequences for the antibody sequence redesign, extending an approach introduced in Hie et al’s 2023 Nature Biotech paper [1].

This means the functionality of DeepAbLLM is model architecture-agnostic.

Currently supports a variety of HuggingFace Protein and Antibody Specific Language Models, including:

[2] ProtTrans Models (ProtBERT, ProtT5, etc.) [3] AbLang [4] IgBERT/IgT5 [5] ESM1b [6] ESM1v [7] ESM-2

The model uses single amino acid tokenization to create input tokens for the models from the antibody sequences. While most protein models expect spaces in the protein sequences:

“T H I S I S A P R O T E I N S E Q U E N C E”

the ESM class of models does not and expects strings of the following format:

“THISISAPROTEINSEQUENCE”

Both tokenization schemes are supported by DeepAbLLM by setting the is_esm_variant flag to the appropriate value.

The model supports general pretraining via masked language modeling, domain-adaptive pretraining [8] - an additional pretraining step applied to general purpose protein language models for antibody sequences, and the finetuning of pre-trained models for regression/classification. To pretrain via masked language modeling task, use task = mlm, ‘regression’, or classification during initialization.

task[source]

The task the HuggingFaceModel is performing. Default: ‘mlm’.

Type:

str

model_path[source]

The huggingface model path of the pLM.

Type:

str

n_tasks[source]

Number of tasks for a given model. Default: 1

Type:

int

is_esm_variant[source]

Boolean flag to indicate the tokenization scheme.

Type:

bool

__init__(task, model_path, n_tasks)[source]

Initialize an DeepAbLLM with specified information.

_mask_seq_pos(position, idx)[source]

Mask a sequence at a particular index.

redesign_residue(sequence, residue_index: int, top_k, verbose)[source]

Mask and unmaks a single residue of a sequence at given index, returning top-k plausible amino acid substitutions.

_optimize_residue_pos(sequence, residue_index, verbose, threshold)[source]

“Optimizes” a residue position by redesigning it and returning only the tokens that rank higher than the original token.

redesign_sequence(sequence)[source]

Returns optimized sequences over each residue position with better scores than the original sequence.

Notes

(Currently Implements): 1. Light or Heavy Chain Re-Design at Arbitrary Point 2. Agnostic to Light or Heavy Chain (Depending on if the specified

model correctly accounts for this)

(WIP) 3. Model consumes both epitope and receptor information to influence logits (Planned) 4. Conditional Generation (Auto-Regressive/Iterative Unmasking)

References

Example Usage:

>>> # Optimize Sequence
>>> from deepchem.models.torch_models.antibody_modeling import DeepAbLLM
>>> model_path = 'Rostlab/prot_bert'
>>> anti_llm = DeepAbLLM(task='mlm', model_path=model_path, n_tasks=1, is_esm_variant=False, device='cpu')
>>> optimized_sequences = anti_llm.redesign_sequence('GSELTQDPAVSVALGQTVRITCQGDSLRNYYASWYQQKPRQAPVLVFYGKNNRPSGIPDRFSGSSSGNTASLTISGAQAEDEADYYCNSRDSSSNHLVFGGGTKLTVLSQ')
>>> # Expected Output
>>> # optimized_sequences[0]
>>> # (0,'Q','QSETQDPAVSVALGQTVRITCQGDSLRNYYASWYQQKPRQAPVLVFYGKNNRPSGIPDRFSGSSSGNTASLTISGAQAEDEADYYCNSRDSSSNHLVFGGGTKLTVLSQ',0.7314766049385071)
__init__(task: str = 'mlm', model_path: str = 'Rostlab/prot_bert', n_tasks: int = 1, is_esm_variant: bool = False, config: Dict[Any, Any] = {}, **kwargs) None[source]
Parameters:
  • task (str) – The task defines the type of learning task in the model. The supported tasks are - mlm - masked language modeling commonly used in pretraining - classification - use it for classification tasks

  • model_path (str) – Path to the HuggingFace model; HF Model Hub or local - ex: ‘Rostlab/prot_bert’

  • n_tasks (int) – Number of prediction targets for a multitask learning model

  • is_esm_variant (bool) – Flag for proper tokenization (S E Q U E N C E vs SEQUENCE).

  • config (dict) – Dictionary of HuggingFace AutoConfig hyper-parameters to update the default pretrained model.

redesign_residue(sequence: str, residue_index: int, top_k: int = 10, verbose: bool = False)[source]

Given a sequence and a residue index, mask and subsequently unmask that position, returning the proposed residues and their respective scores.

Parameters:
  • sequence (str) – The antibody sequence to redesign.

  • residue_index (int) – The residue index to mask and unmask.

  • top_k (int) – The top_k logits to return. Defaults to 10.

  • verbose (bool) – If verbose, prints the original sequence and the residue at residue_index before designing. Useful for running scripts on clusters.

Returns:

sequence_tuples – Returns a list of tuples containing the (replacement token, full sequence, score) for each unmasked token.

Return type:

List[tuple]

redesign_sequence(sequence: str, **kwargs)[source]

Applies the _optimize_residue_pos function to all sequence positions.

Parameters:
  • sequence (str) – Antibody sequence to be optimized

  • Optional

    top_k: int

    Top K logits to be returned by the redesign_residue method

    threshold: float

    Threshold for probability score

    verbose: bool

    Flag to print original and redesigned tokens to the stdout.

Returns:

redesigned_sequences – Returns list of tuples (index, token, sequence, score) that have higher scores than the original and are higher than the sequence threshold specified.

Return type:

List[tuple]

OneFormer

class OneFormer(segmentation_task: str = 'semantic', model_path: str = 'shi-labs/oneformer_ade20k_swin_tiny', model_config: OneFormerConfig | None = None, model_processor: AutoProcessor | None = None, id2label: Dict = {0: 'unlabelled', 1: 'labelled'}, torch_dtype: dtype = torch.float32, **kwargs)[source]

Wrapper class that wraps the OneFormer model as a DeepChem model.

The OneFormer model was proposed in OneFormer: One Transformer to Rule Universal Image Segmentation [1] by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi. OneFormer is a universal image segmentation framework that can be trained on a single panoptic dataset to perform semantic, instance, and panoptic segmentation tasks.

Whilst all official HuggingFace model weights by shi-labs [2] are supported, the current implementation only supports the OneFormer model for semantic segmentation.

Instance and panoptic segmentation tasks are not supported yet.

Usage Example:

>> from PIL import Image >> import numpy as np >> import torch >> from deepchem.data import NumpyDataset, ImageDataset >> from deepchem.models.torch_models import OneFormer >> from deepchem.metrics import Metric, mean_absolute_error, jaccaard_index

>> # Prepare the dataset >> X = np.random.randint(0, 255, (3, 256, 256, 3)) >> y = np.random.randint(0, 1, (3, 256, 256)) >> dataset = ImageDataset(X, y) >> id2label = {0: “label-A”, 1: “label-B”}

>> # Create the model >> model = OneFormer(segmentation_task=”semantic”, model_path=’shi-labs/oneformer_ade20k_swin_tiny’, … id2label=id2label, torch_dtype=torch.float16, batch_size=1)

>> # Train the model >> avg_loss = model.fit(dataset, nb_epoch=1)

>> # Predict the model >> preds = model.predict(dataset)

>> # Evaluate the model >> mae_metric = Metric(mean_absolute_error) >> iou_metric = Metric(jaccaard_index) >> iou = iou_metric.compute_metric(dataset.y, preds) >> mae = mae_metric.compute_metric(np.array(dataset.y).flatten(), np.array(preds).flatten())

References

__init__(segmentation_task: str = 'semantic', model_path: str = 'shi-labs/oneformer_ade20k_swin_tiny', model_config: OneFormerConfig | None = None, model_processor: AutoProcessor | None = None, id2label: Dict = {0: 'unlabelled', 1: 'labelled'}, torch_dtype: dtype = torch.float32, **kwargs) None[source]
Parameters:
  • segmentation_task (str) – The segmentation task to perform. The available tasks are - semantic - semantic segmentation (default) - instance - instance segmentation (not supported yet) - panoptic - panoptic segmentation (not supported yet)

  • model_path (str) – Path to the OneFormer HuggingFace model; HF Model Hub or local - ex: ‘shi-labs/oneformer_ade20k_swin_tiny’

  • model_config (OneFormerConfig) – Optional configuration for the OneFormer model. If not provided, the configuration will be loaded from the model_path. If provided, the configuration will be used to initialize the model instead of the configuration from the model_path.

  • model_processor (AutoProcessor) – Optional processor for the OneFormer model. If not provided, the processor will be loaded from the model_path. If provided, the processor will be used to initialize the model instead of the processor from the model_path.

  • id2label (dict) – A dictionary mapping class indices to class labels.

  • torch_dtype (torch.dtype) – The torch data type to use for the model. The supported data types are - torch.float32 (default) - torch.float16

Note

If a custom model configuration and processor are provided, ensure that model_processor.image_processor.num_text == model_config.num_queries - model_config.text_encoder_n_ctx for contrastive learning to work correctly during training.

fit_generator(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: List[Parameter] | ParameterList | None = None, loss: Callable[[List, List, List], Any] | None = None, callbacks: Callable | List[Callable] = [], all_losses: List[float] | None = None) float[source]

Train this model on data from a generator.

Parameters:
  • generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).

  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.

  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.

  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

  • variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.

  • loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.

  • callbacks (function or list of functions) – one or more functions of the form f(model, step, **kwargs) that will be invoked after every step. This can be used to perform validation, logging, etc.

  • all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.

Return type:

The average loss over the most recent checkpoint interval

Note

A HuggingFace model can return embeddings (last hidden state), attentions. Support must be added to return the embeddings to the user, so that it can be used for other downstream applications.

Trainer

A Trainer object automates the scaling of DeepChem model’s training into multi-gpu and multi-node infrastructures.

DistributedTrainer