Model Classes¶
DeepChem maintains an extensive collection of models for scientific applications. DeepChem’s focus is on facilitating scientific applications, so we support a broad range of different machine learning frameworks (currently scikitlearn, xgboost, TensorFlow, and PyTorch) since different frameworks are more and less suited for different scientific applications.
Model Cheatsheet¶
If you’re just getting started with DeepChem, you’re probably interested in the
basics. The place to get started is this “model cheatsheet” that lists various
types of custom DeepChem models. Note that some wrappers like SklearnModel
and GBDTModel
which wrap external machine learning libraries are excluded,
but this table is otherwise complete.
As a note about how to read this table, each row describes what’s needed to
invoke a given model. Some models must be applied with given Transformer
or
Featurizer
objects. Some models also have custom training methods. You can
read off what’s needed to train the model from the table below.
Model 
Type 
Input Type 
Transformations 
Acceptable Featurizers 
Fit Method 


Classifier/ Regressor 
Tuple 




Classifier/ Regressor 
Tensor of shape





Classifier/ Regressor 
Tensor of shape




Classifier/ Regressor 
Matrix of
shape 




Classifier/ Regressor 





Classifier/ Regressor 





Classifier/ Regressor 





Classifier 
Vector of
shape 




Regressor 
Vector of
shape 




Regressor 
Vector of
shape 
Any 



Classifier 
Vector of
shape 




Classifier 
Vector of
shape 




Regressor 
Vector of
shape 




Classifier 
Vector of
shape 




Regressor 
Vector of
shape 




Classifier 
Vector of
shape 




Sequence 
Sequence 



Classifier/ Regressor 
Sequence 




Classifier/ Regressor 
String 



Adversarial 
Pair 



Classifier/ Regressor 





Classifier/ Regressor 





Classifier/ Regressor 





Classifier/ Regressor 





Classifier/ Regressor 





Regressor 



Model¶

class
Model
(model=None, model_dir: Optional[str] = None, **kwargs)[source]¶ Abstract base class for DeepChem models.

__init__
(model=None, model_dir: Optional[str] = None, **kwargs) → None[source]¶ Abstract class for all models.
This is intended only for convenience of subclass implementations and should not be invoked directly.
 Parameters
model (object) – Wrapper around ScikitLearn/Keras/Tensorflow model object.
model_dir (str, optional (default None)) – Path to directory where model will be stored. If not specified, model will be stored in a temporary directory.

fit_on_batch
(X: Sequence, y: Sequence, w: Sequence)[source]¶ Perform a single step of training.
 Parameters
X (np.ndarray) – the inputs for the batch
y (np.ndarray) – the labels for the batch
w (np.ndarray) – the weights for the batch

predict_on_batch
(X: Union[numpy.ndarray, Sequence])[source]¶ Makes predictions on given batch of new data.
 Parameters
X (np.ndarray) – Features

static
get_model_filename
(model_dir: str) → str[source]¶ Given model directory, obtain filename for the model itself.

static
get_params_filename
(model_dir: str) → str[source]¶ Given model directory, obtain filename for the model itself.

save
() → None[source]¶ Dispatcher function for saving.
Each subclass is responsible for overriding this method.

fit
(dataset: deepchem.data.datasets.Dataset)[source]¶ Fits a model on data in a Dataset object.
 Parameters
dataset (Dataset) – the Dataset to train on

predict
(dataset: deepchem.data.datasets.Dataset, transformers: List[transformers.Transformer] = []) → numpy.ndarray[source]¶ Uses self to make predictions on provided Dataset object.
 Parameters
dataset (Dataset) – Dataset to make prediction on
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
 Returns
A numpy array of predictions the model produces.
 Return type
np.ndarray

evaluate
(dataset: deepchem.data.datasets.Dataset, metrics: List[deepchem.metrics.metric.Metric], transformers: List[transformers.Transformer] = [], per_task_metrics: bool = False, use_sample_weights: bool = False, n_classes: int = 2)[source]¶ Evaluates the performance of this model on specified dataset.
This function uses Evaluator under the hood to perform model evaluation. As a result, it inherits the same limitations of Evaluator. Namely, that only regression and classification models can be evaluated in this fashion. For generator models, you will need to overwrite this method to perform a custom evaluation.
Keyword arguments specified here will be passed to Evaluator.compute_model_performance.
 Parameters
dataset (Dataset) – Dataset object.
metrics (Metric / List[Metric] / function) – The set of metrics provided. This class attempts to do some intelligent handling of input. If a single dc.metrics.Metric object is provided or a list is provided, it will evaluate self.model on these metrics. If a function is provided, it is assumed to be a metric function that this method will attempt to wrap in a dc.metrics.Metric object. A metric function must accept two arguments, y_true, y_pred both of which are np.ndarray objects and return a floating point score. The metric function may also accept a keyword argument sample_weight to account for persample weights.
transformers (List[Transformer]) – List of dc.trans.Transformer objects. These transformations must have been applied to dataset previously. The dataset will be untransformed for metric evaluation.
per_task_metrics (bool, optional (default False)) – If true, return computed metric for each task on multitask dataset.
use_sample_weights (bool, optional (default False)) – If set, use persample weights w.
n_classes (int, optional (default None)) – If specified, will use n_classes as the number of unique classes in self.dataset. Note that this argument will be ignored for regression metrics.
 Returns
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument, then returns a second dictionary of scores for each task separately.

ScikitLearn Models¶
Scikitlearn’s models can be wrapped so that they can interact conveniently with DeepChem. Oftentimes scikitlearn models are more robust and easier to train and are a nice first model to train.
SklearnModel¶

class
SklearnModel
(model: sklearn.base.BaseEstimator, model_dir: Optional[str] = None, **kwargs)[source]¶ Wrapper class that wraps scikitlearn models as DeepChem models.
When you’re working with scikitlearn and DeepChem, at times it can be useful to wrap a scikitlearn model as a DeepChem model. The reason for this might be that you want to do an applestoapples comparison of a scikitlearn model to another DeepChem model, or perhaps you want to use the hyperparameter tuning capabilities in dc.hyper. The SklearnModel class provides a wrapper around scikitlearn models that allows scikitlearn models to be trained on Dataset objects and evaluated with the same metrics as other DeepChem models.
Notes
All SklearnModels perform learning solely in memory. This means that it may not be possible to train SklearnModel on large `Dataset`s.

__init__
(model: sklearn.base.BaseEstimator, model_dir: Optional[str] = None, **kwargs)[source]¶  Parameters
model (BaseEstimator) – The model instance which inherits a scikitlearn BaseEstimator Class.
model_dir (str, optional (default None)) – If specified the model will be stored in this directory. Else, a temporary directory will be used.
model_instance (BaseEstimator (DEPRECATED)) – The model instance which inherits a scikitlearn BaseEstimator Class.
kwargs (dict) – kwargs[‘use_weights’] is a bool which determines if we pass weights into self.model.fit().

fit
(dataset: deepchem.data.datasets.Dataset) → None[source]¶ Fits scikitlearn model to data.
 Parameters
dataset (Dataset) – The Dataset to train this model on.

predict_on_batch
(X: Union[numpy.ndarray, Sequence]) → numpy.ndarray[source]¶ Makes predictions on batch of data.
 Parameters
X (np.ndarray) – A numpy array of features.
 Returns
The value is a return value of predict_proba or predict method of the scikitlearn model. If the scikitlearn model has both methods, the value is always a return value of predict_proba.
 Return type
np.ndarray

predict
(X: deepchem.data.datasets.Dataset, transformers: List[transformers.Transformer] = []) → numpy.ndarray[source]¶ Makes predictions on dataset.
 Parameters
dataset (Dataset) – Dataset to make prediction on.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.

Gradient Boosting Models¶
Gradient Boosting Models (LightGBM and XGBoost) can be wrapped so they can interact with DeepChem.
GBDTModel¶

class
GBDTModel
(model: sklearn.base.BaseEstimator, model_dir: Optional[str] = None, early_stopping_rounds: int = 50, eval_metric: Optional[Union[Callable, str]] = None, **kwargs)[source]¶ Wrapper class that wraps GBDT models as DeepChem models.
This class supports LightGBM/XGBoost models.

__init__
(model: sklearn.base.BaseEstimator, model_dir: Optional[str] = None, early_stopping_rounds: int = 50, eval_metric: Optional[Union[Callable, str]] = None, **kwargs)[source]¶  Parameters
model (BaseEstimator) – The model instance of scikitlearn wrapper LightGBM/XGBoost models.
model_dir (str, optional (default None)) – Path to directory where model will be stored.
early_stopping_rounds (int, optional (default 50)) – Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training.
eval_metric (Union[str, Callbale]) – If string, it should be a builtin evaluation metric to use. If callable, it should be a custom evaluation metric, see official note for more details.

fit
(dataset: deepchem.data.datasets.Dataset)[source]¶ Fits GDBT model with all data.
First, this function splits all data into train and valid data (8:2), and finds the best n_estimators. And then, we retrain all data using best n_estimators * 1.25.
 Parameters
dataset (Dataset) – The Dataset to train this model on.

Deep Learning Infrastructure¶
DeepChem maintains a lightweight layer of common deep learning model infrastructure that can be used for models built with different underlying frameworks. The losses and optimizers can be used for both TensorFlow and PyTorch models.
Losses¶

class
HuberLoss
[source]¶ Modified version of L1 Loss, also known as Smooth L1 loss. Less sensitive to small errors, linear for larger errors. Huber loss is generally better for cases where are are both large outliers as well as small, as compared to the L1 loss. By default, Delta = 1.0 and reduction = ‘none’.

class
HingeLoss
[source]¶ The hinge loss function.
The ‘output’ argument should contain logits, and all elements of ‘labels’ should equal 0 or 1.

class
SquaredHingeLoss
[source]¶ The Squared Hinge loss function.
Defined as the square of the hinge loss between y_true and y_pred. The Squared Hinge Loss is differentiable.

class
PoissonLoss
[source]¶ The Poisson loss function is defined as the mean of the elements of y_pred  (y_true * log(y_pred) for an input of (y_true, y_pred). Poisson loss is generally used for regression tasks where the data follows the poisson

class
BinaryCrossEntropy
[source]¶ The cross entropy between pairs of probabilities.
The arguments should each have shape (batch_size) or (batch_size, tasks) and contain probabilities.

class
CategoricalCrossEntropy
[source]¶ The cross entropy between two probability distributions.
The arguments should each have shape (batch_size, classes) or (batch_size, tasks, classes), and represent a probability distribution over classes.

class
SigmoidCrossEntropy
[source]¶ The cross entropy between pairs of probabilities.
The arguments should each have shape (batch_size) or (batch_size, tasks). The labels should be probabilities, while the outputs should be logits that are converted to probabilities using a sigmoid function.

class
SoftmaxCrossEntropy
[source]¶ The cross entropy between two probability distributions.
The arguments should each have shape (batch_size, classes) or (batch_size, tasks, classes). The labels should be probabilities, while the outputs should be logits that are converted to probabilities using a softmax function.

class
SparseSoftmaxCrossEntropy
[source]¶ The cross entropy between two probability distributions.
The labels should have shape (batch_size) or (batch_size, tasks), and be integer class labels. The outputs have shape (batch_size, classes) or (batch_size, tasks, classes) and be logits that are converted to probabilities using a softmax function.

class
VAE_ELBO
[source]¶ The Variational AutoEncoder loss, KL Divergence Regularize + marginal loglikelihood.
This losses based on _[1]. ELBO(Evidence lower bound) lexically replaced Variational lower bound. BCE means marginal loglikelihood, and KLD means KL divergence with normal distribution. Added hyper parameter ‘kl_scale’ for KLD.
The logvar and mu should have shape (batch_size, hidden_space). The x and reconstruction_x should have (batch_size, attribute). The kl_scale should be float.
Examples
Examples for calculating loss using constant tensor.
batch_size = 2, hidden_space = 2, num of original attribute = 3 >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> logvar = np.array([[1.0,1.3],[0.6,1.2]]) >>> mu = np.array([[0.2,0.7],[1.2,0.4]]) >>> x = np.array([[0.9,0.4,0.8],[0.3,0,1]]) >>> reconstruction_x = np.array([[0.8,0.3,0.7],[0.2,0,0.9]])
Case tensorflow >>> VAE_ELBO()._compute_tf_loss(tf.constant(logvar), tf.constant(mu), tf.constant(x), tf.constant(reconstruction_x)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.70165154, 0.76238271])>
Case pytorch >>> (VAE_ELBO()._create_pytorch_loss())(torch.tensor(logvar), torch.tensor(mu), torch.tensor(x), torch.tensor(reconstruction_x)) tensor([0.7017, 0.7624], dtype=torch.float64)
References
 1
Kingma, Diederik P., and Max Welling. “Autoencoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013).

class
VAE_KLDivergence
[source]¶ The KL_divergence between hidden distribution and normal distribution.
This loss represents KL divergence losses between normal distribution(using parameter of distribution) based on _[1].
The logvar should have shape (batch_size, hidden_space) and each term represents standard deviation of hidden distribution. The mean shuold have (batch_size, hidden_space) and each term represents mean of hidden distribtuon.
Examples
Examples for calculating loss using constant tensor.
batch_size = 2, hidden_space = 2, >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> logvar = np.array([[1.0,1.3],[0.6,1.2]]) >>> mu = np.array([[0.2,0.7],[1.2,0.4]])
Case tensorflow >>> VAE_KLDivergence()._compute_tf_loss(tf.constant(logvar), tf.constant(mu)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.17381787, 0.51425203])>
Case pytorch >>> (VAE_KLDivergence()._create_pytorch_loss())(torch.tensor(logvar), torch.tensor(mu)) tensor([0.1738, 0.5143], dtype=torch.float64)
References
 1
Kingma, Diederik P., and Max Welling. “Autoencoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013).

class
ShannonEntropy
[source]¶ The ShannonEntropy of discretedistribution.
This loss represents shannon entropy based on _[1].
The inputs should have shape (batch size, num of variable) and represents probabilites distribution.
Examples
Examples for calculating loss using constant tensor.
batch_size = 2, num_of variable = variable, >>> import numpy as np >>> import torch >>> import tensorflow as tf >>> inputs = np.array([[0.7,0.3],[0.9,0.1]])
Case tensorflow >>> ShannonEntropy()._compute_tf_loss(tf.constant(inputs)) <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.30543215, 0.16254149])>
Case pytorch >>> (ShannonEntropy()._create_pytorch_loss())(torch.tensor(inputs)) tensor([0.3054, 0.1625], dtype=torch.float64)
References
 1
Chen, Ricky Xiaofeng. “A Brief Introduction to Shannon’s Information Theory.” arXiv preprint arXiv:1612.09316 (2016).
Optimizers¶

class
Optimizer
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule])[source]¶ An algorithm for optimizing a model.
This is an abstract class. Subclasses represent specific optimization algorithms.

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule])[source]¶ This constructor should only be called by subclasses.
 Parameters
learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization


class
LearningRateSchedule
[source]¶ A schedule for changing the learning rate over the course of optimization.
This is an abstract class. Subclasses represent specific schedules.

class
AdaGrad
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, initial_accumulator_value: float = 0.1, epsilon: float = 1e07)[source]¶ The AdaGrad optimization algorithm.
Adagrad is an optimizer with parameterspecific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates. See [1]_ for a full reference for the algorithm.
References
 1
Duchi, John, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” Journal of machine learning research 12.7 (2011).

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, initial_accumulator_value: float = 0.1, epsilon: float = 1e07)[source]¶ Construct an AdaGrad optimizer. :param learning_rate: the learning rate to use for optimization :type learning_rate: float or LearningRateSchedule :param initial_accumulator_value: a parameter of the AdaGrad algorithm :type initial_accumulator_value: float :param epsilon: a parameter of the AdaGrad algorithm :type epsilon: float

class
Adam
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08)[source]¶ The Adam optimization algorithm.

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08)[source]¶ Construct an Adam optimizer.
 Parameters
learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization
beta1 (float) – a parameter of the Adam algorithm
beta2 (float) – a parameter of the Adam algorithm
epsilon (float) – a parameter of the Adam algorithm


class
AdamW
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, weight_decay: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.01, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08, amsgrad: bool = False)[source]¶ The AdamW optimization algorithm. AdamW is a variant of Adam, with improved weight decay. In Adam, weight decay is implemented as: weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) In AdamW, weight decay is implemented as: weight_decay (float, optional) – weight decay coefficient (default: 1e2)

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, weight_decay: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.01, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08, amsgrad: bool = False)[source]¶ Construct an AdamW optimizer. :param learning_rate: the learning rate to use for optimization :type learning_rate: float or LearningRateSchedule :param weight_decay: weight decay coefficient for AdamW :type weight_decay: float or LearningRateSchedule :param beta1: a parameter of the Adam algorithm :type beta1: float :param beta2: a parameter of the Adam algorithm :type beta2: float :param epsilon: a parameter of the Adam algorithm :type epsilon: float :param amsgrad: If True, will use the AMSGrad variant of AdamW (from “On the Convergence of Adam and Beyond”), else will use the original algorithm. :type amsgrad: bool


class
SparseAdam
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08)[source]¶ The Sparse Adam optimization algorithm, also known as Lazy Adam. Sparse Adam is suitable for sparse tensors. It handles sparse updates more efficiently. It only updates movingaverage accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices.

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e08)[source]¶ Construct an Adam optimizer.
 Parameters
learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization
beta1 (float) – a parameter of the SparseAdam algorithm
beta2 (float) – a parameter of the SparseAdam algorithm
epsilon (float) – a parameter of the SparseAdam algorithm


class
RMSProp
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, momentum: float = 0.0, decay: float = 0.9, epsilon: float = 1e10)[source]¶ RMSProp Optimization algorithm.

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, momentum: float = 0.0, decay: float = 0.9, epsilon: float = 1e10)[source]¶ Construct an RMSProp Optimizer.
 Parameters
learning_rate (float or LearningRateSchedule) – the learning_rate used for optimization
momentum (float, default 0.0) – a parameter of the RMSProp algorithm
decay (float, default 0.9) – a parameter of the RMSProp algorithm
epsilon (float, default 1e10) – a parameter of the RMSProp algorithm


class
GradientDescent
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001)[source]¶ The gradient descent optimization algorithm.

__init__
(learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001)[source]¶ Construct a gradient descent optimizer.
 Parameters
learning_rate (float or LearningRateSchedule) – the learning rate to use for optimization


class
ExponentialDecay
(initial_rate: float, decay_rate: float, decay_steps: int, staircase: bool = True)[source]¶ A learning rate that decreases exponentially with the number of training steps.

__init__
(initial_rate: float, decay_rate: float, decay_steps: int, staircase: bool = True)[source]¶ Create an exponentially decaying learning rate.
The learning rate starts as initial_rate. Every decay_steps training steps, it is multiplied by decay_rate.
 Parameters
initial_rate (float) – the initial learning rate
decay_rate (float) – the base of the exponential
decay_steps (int) – the number of training steps over which the rate decreases by decay_rate
staircase (bool) – if True, the learning rate decreases by discrete jumps every decay_steps. if False, the learning rate decreases smoothly every step


class
PolynomialDecay
(initial_rate: float, final_rate: float, decay_steps: int, power: float = 1.0)[source]¶ A learning rate that decreases from an initial value to a final value over a fixed number of training steps.

__init__
(initial_rate: float, final_rate: float, decay_steps: int, power: float = 1.0)[source]¶ Create a smoothly decaying learning rate.
The learning rate starts as initial_rate. It smoothly decreases to final_rate over decay_steps training steps. It decays as a function of (1step/decay_steps)**power. Once the final rate is reached, it remains there for the rest of optimization.
 Parameters
initial_rate (float) – the initial learning rate
final_rate (float) – the final learning rate
decay_steps (int) – the number of training steps over which the rate decreases from initial_rate to final_rate
power (float) – the exponent controlling the shape of the decay


class
LinearCosineDecay
(initial_rate: float, decay_steps: int, alpha: float = 0.0, beta: float = 0.001, num_periods: float = 0.5)[source]¶ Applies linear cosine decay to the learning rate

__init__
(initial_rate: float, decay_steps: int, alpha: float = 0.0, beta: float = 0.001, num_periods: float = 0.5)[source]¶  Parameters
learning_rate (float) –
learning rate (initial) –
decay_steps (int) –
of steps to decay over (number) –
num_periods (number of periods in the cosine part of the decay) –

Keras Models¶
DeepChem extensively uses Keras to build deep learning models.
KerasModel¶
Training loss and validation metrics can be automatically logged to Weights & Biases with the following commands:
# Install wandb in shell
pip install wandb
# Login in shell (required only once)
wandb login
# Start a W&B run in your script (refer to docs for optional parameters)
wandb.init(project="my project")
# Set `wandb` arg when creating `KerasModel`
model = KerasModel(…, wandb=True)

class
KerasModel
(model: tensorflow.python.keras.engine.training.Model, loss: Union[deepchem.models.losses.Loss, Callable[[List, List, List], Any]], output_types: Optional[List[str]] = None, batch_size: int = 100, model_dir: Optional[str] = None, learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, optimizer: Optional[deepchem.models.optimizers.Optimizer] = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, **kwargs)[source]¶ This is a DeepChem model implemented by a Keras model.
This class provides several advantages over using the Keras model’s fitting and prediction methods directly.
It provides better integration with the rest of DeepChem, such as direct support for Datasets and Transformers.
It defines the loss in a more flexible way. In particular, Keras does not support multidimensional weight matrices, which makes it impossible to implement most multitask models with Keras.
It provides various additional features not found in the Keras Model class, such as uncertainty prediction and saliency mapping.
Here is a simple example of code that uses KerasModel to train a Keras model on a DeepChem dataset.
>> keras_model = tf.keras.Sequential([ >> tf.keras.layers.Dense(1000, activation=’tanh’), >> tf.keras.layers.Dense(1) >> ]) >> model = KerasModel(keras_model, loss=dc.models.losses.L2Loss()) >> model.fit(dataset)
The loss function for a model can be defined in two different ways. For models that have only a single output and use a standard loss function, you can simply provide a dc.models.losses.Loss object. This defines the loss for each sample or sample/task pair. The result is automatically multiplied by the weights and averaged over the batch. Any additional losses computed by model layers, such as weight decay penalties, are also added.
For more complicated cases, you can instead provide a function that directly computes the total loss. It must be of the form f(outputs, labels, weights), taking the list of outputs from the model, the expected values, and any weight matrices. It should return a scalar equal to the value of the loss function for the batch. No additional processing is done to the result; it is up to you to do any weighting, averaging, adding of penalty terms, etc.
You can optionally provide an output_types argument, which describes how to interpret the model’s outputs. This should be a list of strings, one for each output. You can use an arbitrary output_type for a output, but some output_types are special and will undergo extra processing:
‘prediction’: This is a normal output, and will be returned by predict(). If output types are not specified, all outputs are assumed to be of this type.
‘loss’: This output will be used in place of the normal outputs for computing the loss function. For example, models that output probability distributions usually do it by computing unbounded numbers (the logits), then passing them through a softmax function to turn them into probabilities. When computing the cross entropy, it is more numerically stable to use the logits directly rather than the probabilities. You can do this by having the model produce both probabilities and logits as outputs, then specifying output_types=[‘prediction’, ‘loss’]. When predict() is called, only the first output (the probabilities) will be returned. But during training, it is the second output (the logits) that will be passed to the loss function.
‘variance’: This output is used for estimating the uncertainty in another output. To create a model that can estimate uncertainty, there must be the same number of ‘prediction’ and ‘variance’ outputs. Each variance output must have the same shape as the corresponding prediction output, and each element is an estimate of the variance in the corresponding prediction. Also be aware that if a model supports uncertainty, it MUST use dropout on every layer, and dropout most be enabled during uncertainty prediction. Otherwise, the uncertainties it computes will be inaccurate.
other: Arbitrary output_types can be used to extract outputs produced by the model, but will have no additional processing performed.

__init__
(model: tensorflow.python.keras.engine.training.Model, loss: Union[deepchem.models.losses.Loss, Callable[[List, List, List], Any]], output_types: Optional[List[str]] = None, batch_size: int = 100, model_dir: Optional[str] = None, learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, optimizer: Optional[deepchem.models.optimizers.Optimizer] = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, **kwargs) → None[source]¶ Create a new KerasModel.
 Parameters
model (tf.keras.Model) – the Keras model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above
output_types (list of strings) – the type of each output from the model, as described above
batch_size (int) – default batch size for training and evaluating
model_dir (str) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.
learning_rate (float or LearningRateSchedule) – the learning rate to use for fitting. If optimizer is specified, this is ignored.
optimizer (Optimizer) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.
tensorboard (bool) – whether to log progress to TensorBoard during training
wandb (bool) – whether to log progress to Weights & Biases during training
log_frequency (int) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.

fit
(dataset: deepchem.data.datasets.Dataset, nb_epoch: int = 10, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, deterministic: bool = False, restore: bool = False, variables: Optional[List[tensorflow.python.ops.variables.Variable]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], all_losses: Optional[List[float]] = None) → float[source]¶ Train this model on a dataset.
 Parameters
dataset (Dataset) – the Dataset to train on
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.
 Returns
 Return type
The average loss over the most recent checkpoint interval

fit_generator
(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: Optional[List[tensorflow.python.ops.variables.Variable]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], all_losses: Optional[List[float]] = None) → float[source]¶ Train this model on data from a generator.
 Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.
 Returns
 Return type
The average loss over the most recent checkpoint interval

fit_on_batch
(X: Sequence, y: Sequence, w: Sequence, variables: Optional[List[tensorflow.python.ops.variables.Variable]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], checkpoint: bool = True, max_checkpoints_to_keep: int = 5) → float[source]¶ Perform a single step of training.
 Parameters
X (ndarray) – the inputs for the batch
y (ndarray) – the labels for the batch
w (ndarray) – the weights for the batch
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
checkpoint (bool) – if true, save a checkpoint after performing the training step
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
 Returns
 Return type
the loss on the batch

predict_on_generator
(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[transformers.Transformer] = [], outputs: Optional[Union[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor]]] = None, output_types: Optional[Union[str, Sequence[str]]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶  Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned. If outputs is specified, output_types must be None.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.
Returns – a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

predict_on_batch
(X: Union[numpy.ndarray, Sequence], transformers: List[transformers.Transformer] = [], outputs: Optional[Union[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor]]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Generates predictions for input samples, processing samples in a batch.
 Parameters
X (ndarray) – the input data, as a Numpy array.
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned.
 Returns
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs

predict_uncertainty_on_batch
(X: Sequence, masks: int = 50) → Union[Tuple[numpy.ndarray, numpy.ndarray], Sequence[Tuple[numpy.ndarray, numpy.ndarray]]][source]¶ Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.
 Parameters
X (ndarray) – the input data, as a Numpy array.
masks (int) – the number of dropout masks to average over
 Returns
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
deviation of the corresponding element of y_pred

predict
(dataset: deepchem.data.datasets.Dataset, transformers: List[transformers.Transformer] = [], outputs: Optional[Union[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor]]] = None, output_types: Optional[List[str]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Uses self to make predictions on provided Dataset object.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction outputs will be returned. Alternatively one or more Tensors within the model may be specified, in which case the output of those Tensors will be returned.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.
 Returns
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs

predict_embedding
(dataset: deepchem.data.datasets.Dataset) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Predicts embeddings created by underlying model if any exist. An embedding must be specified to have output_type of ‘embedding’ in the model definition.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
 Returns
a NumPy array of the embeddings model produces, or a list
of arrays if it produces multiple embeddings

predict_uncertainty
(dataset: deepchem.data.datasets.Dataset, masks: int = 50) → Union[Tuple[numpy.ndarray, numpy.ndarray], Sequence[Tuple[numpy.ndarray, numpy.ndarray]]][source]¶ Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
masks (int) – the number of dropout masks to average over
 Returns
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
deviation of the corresponding element of y_pred

evaluate_generator
(generator: Iterable[Tuple[Any, Any, Any]], metrics: List[deepchem.metrics.metric.Metric], transformers: List[transformers.Transformer] = [], per_task_metrics: bool = False)[source]¶ Evaluate the performance of this model on the data produced by a generator.
 Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
metric (list of deepchem.metrics.Metric) – Evaluation metric
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
per_task_metrics (bool) – If True, return pertask scores.
 Returns
Maps tasks to scores under metric.
 Return type
dict

compute_saliency
(X: numpy.ndarray) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Compute the saliency map for an input sample.
This computes the Jacobian matrix with the derivative of each output element with respect to each input element. More precisely,
If this model has a single output, it returns a matrix of shape (output_shape, input_shape) with the derivatives.
If this model has multiple outputs, it returns a list of matrices, one for each output.
This method cannot be used on models that take multiple inputs.
 Parameters
X (ndarray) – the input data for a single sample
 Returns
 Return type
the Jacobian matrix, or a list of matrices

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

save_checkpoint
(max_checkpoints_to_keep: int = 5, model_dir: Optional[str] = None) → None[source]¶ Save a checkpoint to disk.
Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.
 Parameters
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir

get_checkpoints
(model_dir: Optional[str] = None)[source]¶ Get a list of all available checkpoint files.
 Parameters
model_dir (str, default None) – Directory to get list of checkpoints from. Reverts to self.model_dir if None

restore
(checkpoint: Optional[str] = None, model_dir: Optional[str] = None) → None[source]¶ Reload the values of all variables from a checkpoint file.
 Parameters
checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir.

load_from_pretrained
(source_model: deepchem.models.keras_model.KerasModel, assignment_map: Optional[Dict[Any, Any]] = None, value_map: Optional[Dict[Any, Any]] = None, checkpoint: Optional[str] = None, model_dir: Optional[str] = None, include_top: bool = True, inputs: Optional[Sequence[Any]] = None, **kwargs) → None[source]¶ Copies variable values from a pretrained model. source_model can either be a pretrained model or a model with the same architecture. value_map is a variablevalue dictionary. If no value_map is provided, the variable values are restored to the source_model from a checkpoint and a default value_map is created. assignment_map is a dictionary mapping variables from the source_model to the current model. If no assignment_map is provided, one is made from scratch and assumes the model is composed of several different layers, with the final one being a dense layer. include_top is used to control whether or not the final dense layer is used. The default assignment map is useful in cases where the type of task is different (classification vs regression) and/or number of tasks in the setting.
 Parameters
source_model (dc.KerasModel, required) – source_model can either be the pretrained model or a dc.KerasModel with the same architecture as the pretrained model. It is used to restore from a checkpoint, if value_map is None and to create a default assignment map if assignment_map is None
assignment_map (Dict, default None) – Dictionary mapping the source_model variables and current model variables
value_map (Dict, default None) – Dictionary containing source_model trainable variables mapped to numpy arrays. If value_map is None, the values are restored and a default variable map is created using the restored values
checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints
model_dir (str, default None) – Restore model from custom model directory if needed
include_top (bool, default True) – if True, copies the weights and bias associated with the final dense layer. Used only when assignment map is None
inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self. This option is useful only for models that are built by subclassing tf.keras.Model, and not using the functional API by tf.keras
MultitaskRegressor¶

class
MultitaskRegressor
(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = 'relu', uncertainty: bool = False, residual: bool = False, **kwargs)[source]¶ A fully connected network for multitask regression.
This class provides lots of options for customizing aspects of the model: the number and widths of layers, the activation functions, regularization methods, etc.
It optionally can compose the model from preactivation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of dense layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same width. Wherever the layer width changes, a simple dense layer will be used even if residual=True.

__init__
(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = 'relu', uncertainty: bool = False, residual: bool = False, **kwargs) → None[source]¶ Create a MultitaskRegressor.
In addition to the following arguments, this class also accepts all the keywork arguments from TensorGraph.
 Parameters
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted
residual (bool) – if True, the model will be composed of preactivation residual blocks instead of a simple stack of dense layers.

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

MultitaskFitTransformRegressor¶

class
MultitaskFitTransformRegressor
(n_tasks: int, n_features: int, fit_transformers: Sequence[transformers.Transformer] = [], batch_size: int = 50, **kwargs)[source]¶ Implements a MultitaskRegressor that performs onthefly transformation during fit/predict.
Examples
>>> n_samples = 10 >>> n_features = 3 >>> n_tasks = 1 >>> ids = np.arange(n_samples) >>> X = np.random.rand(n_samples, n_features, n_features) >>> y = np.zeros((n_samples, n_tasks)) >>> w = np.ones((n_samples, n_tasks)) >>> dataset = dc.data.NumpyDataset(X, y, w, ids) >>> fit_transformers = [dc.trans.CoulombFitTransformer(dataset)] >>> model = dc.models.MultitaskFitTransformRegressor(n_tasks, [n_features, n_features], ... dropouts=[0.], learning_rate=0.003, weight_init_stddevs=[np.sqrt(6)/np.sqrt(1000)], ... batch_size=n_samples, fit_transformers=fit_transformers) >>> model.n_features 12

__init__
(n_tasks: int, n_features: int, fit_transformers: Sequence[transformers.Transformer] = [], batch_size: int = 50, **kwargs)[source]¶ Create a MultitaskFitTransformRegressor.
In addition to the following arguments, this class also accepts all the keywork arguments from MultitaskRegressor.
 Parameters
n_tasks (int) – number of tasks
n_features (list or int) – number of features
fit_transformers (list) – List of dc.trans.FitTransformer objects

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

predict_on_generator
(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[transformers.Transformer] = [], output_types: Optional[Union[str, Sequence[str]]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶  Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.
Returns – a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

MultitaskClassifier¶

class
MultitaskClassifier
(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = 'relu', n_classes: int = 2, residual: bool = False, **kwargs)[source]¶ A fully connected network for multitask classification.
This class provides lots of options for customizing aspects of the model: the number and widths of layers, the activation functions, regularization methods, etc.
It optionally can compose the model from preactivation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of dense layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same width. Wherever the layer width changes, a simple dense layer will be used even if residual=True.

__init__
(n_tasks: int, n_features: int, layer_sizes: Sequence[int] = [1000], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = 'relu', n_classes: int = 2, residual: bool = False, **kwargs) → None[source]¶ Create a MultitaskClassifier.
In addition to the following arguments, this class also accepts all the keyword arguments from TensorGraph.
 Parameters
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.
n_classes (int) – the number of classes
residual (bool) – if True, the model will be composed of preactivation residual blocks instead of a simple stack of dense layers.

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

TensorflowMultitaskIRVClassifier¶
RobustMultitaskClassifier¶

class
RobustMultitaskClassifier
(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_classes=2, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]¶ Implements a neural network for robust multitasking.
The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility toroute around challenges in multitasking with destructive interference.
References
This technique was introduced in [1]_
 1
Ramsundar, Bharath, et al. “Is multitask deep learning practical for pharma?.” Journal of chemical information and modeling 57.8 (2017): 20682076.

__init__
(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_classes=2, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]¶ Create a RobustMultitaskClassifier.
 Parameters
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
n_classes (int) – the number of classes
bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.
bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers. same requirements as weight_init_stddevs
bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers same requirements as bias_init_consts
bypass_dropouts (list or float) – the dropout probablity to use for bypass layers. same requirements as dropouts

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
RobustMultitaskRegressor¶

class
RobustMultitaskRegressor
(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]¶ Implements a neural network for robust multitasking.
The key idea of this model is to have bypass layers that feed directly from features to task output. This might provide some flexibility toroute around challenges in multitasking with destructive interference.
References
 1
Ramsundar, Bharath, et al. “Is multitask deep learning practical for pharma?.” Journal of chemical information and modeling 57.8 (2017): 20682076.

__init__
(n_tasks, n_features, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, bypass_layer_sizes=[100], bypass_weight_init_stddevs=[0.02], bypass_bias_init_consts=[1.0], bypass_dropouts=[0.5], **kwargs)[source]¶ Create a RobustMultitaskRegressor.
 Parameters
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.
bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers. same requirements as weight_init_stddevs
bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers same requirements as bias_init_consts
bypass_dropouts (list or float) – the dropout probablity to use for bypass layers. same requirements as dropouts

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
ProgressiveMultitaskClassifier¶

class
ProgressiveMultitaskClassifier
(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, **kwargs)[source]¶ Implements a progressive multitask neural network for classification.
Progressive Networks: https://arxiv.org/pdf/1606.04671v3.pdf
Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.

__init__
(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, **kwargs)[source]¶ Creates a progressive network.
Only listing parameters specific to progressive networks here.
 Parameters
n_tasks (int) – Number of tasks
n_features (int) – Number of input features
alpha_init_stddevs (list) – List of standarddeviations for alpha in adapter layers.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

ProgressiveMultitaskRegressor¶

class
ProgressiveMultitaskRegressor
(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_outputs=1, **kwargs)[source]¶ Implements a progressive multitask neural network for regression.
Progressive networks allow for multitask learning where each task gets a new column of weights. As a result, there is no exponential forgetting where previous tasks are ignored.
References
See [1]_ for a full description of the progressive architecture
 1
Rusu, Andrei A., et al. “Progressive neural networks.” arXiv preprint arXiv:1606.04671 (2016).

__init__
(n_tasks, n_features, alpha_init_stddevs=0.02, layer_sizes=[1000], weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, n_outputs=1, **kwargs)[source]¶ Creates a progressive network.
Only listing parameters specific to progressive networks here.
 Parameters
n_tasks (int) – Number of tasks
n_features (int) – Number of input features
alpha_init_stddevs (list) – List of standarddeviations for alpha in adapter layers.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.

add_adapter
(all_layers, task, layer_num)[source]¶ Add an adapter connection for given task/layer combo

fit
(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, **kwargs)[source]¶ Train this model on a dataset.
 Parameters
dataset (Dataset) – the Dataset to train on
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.
 Returns
 Return type
The average loss over the most recent checkpoint interval
WeaveModel¶

class
WeaveModel
(n_tasks: int, n_atom_feat: Union[int, Sequence[int]] = 75, n_pair_feat: Union[int, Sequence[int]] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: List[int] = [2000, 100], conv_weight_init_stddevs: Union[float, Sequence[float]] = 0.03, weight_init_stddevs: Union[float, Sequence[float]] = 0.01, bias_init_consts: Union[float, Sequence[float]] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.25, final_conv_activation_fn: Optional[Union[Callable, str]] = <function tanh>, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = <function relu>, batch_normalize: bool = True, batch_normalize_kwargs: Dict = {'fused': False, 'renorm': True}, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]¶ Implements Googlestyle Weave Graph Convolutions
This model implements the Weave style graph convolutions from [1]_.
The biggest difference between WeaveModel style convolutions and GraphConvModel style convolutions is that Weave convolutions model bond features explicitly. This has the side effect that it needs to construct a NxN matrix explicitly to model bond interactions. This may cause scaling issues, but may possibly allow for better modeling of subtle bond effects.
Note that [1]_ introduces a whole variety of different architectures for Weave models. The default settings in this class correspond to the W2N2 variant from [1]_ which is the most commonly used variant..
Examples
Here’s an example of how to fit a WeaveModel on a tiny sample dataset.
>>> import numpy as np >>> import deepchem as dc >>> featurizer = dc.feat.WeaveFeaturizer() >>> X = featurizer(["C", "CC"]) >>> y = np.array([1, 0]) >>> dataset = dc.data.NumpyDataset(X, y) >>> model = dc.models.WeaveModel(n_tasks=1, n_weave=2, fully_connected_layer_sizes=[2000, 1000], mode="classification") >>> loss = model.fit(dataset)
Note
In general, the use of batch normalization can cause issues with NaNs. If you’re having trouble with NaNs while using this model, consider setting batch_normalize_kwargs={“trainable”: False} or turning off batch normalization entirely with batch_normalize=False.
References
 1
Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond
fingerprints.” Journal of computeraided molecular design 30.8 (2016): 595608.

__init__
(n_tasks: int, n_atom_feat: Union[int, Sequence[int]] = 75, n_pair_feat: Union[int, Sequence[int]] = 14, n_hidden: int = 50, n_graph_feat: int = 128, n_weave: int = 2, fully_connected_layer_sizes: List[int] = [2000, 100], conv_weight_init_stddevs: Union[float, Sequence[float]] = 0.03, weight_init_stddevs: Union[float, Sequence[float]] = 0.01, bias_init_consts: Union[float, Sequence[float]] = 0.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.25, final_conv_activation_fn: Optional[Union[Callable, str]] = <function tanh>, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = <function relu>, batch_normalize: bool = True, batch_normalize_kwargs: Dict = {'fused': False, 'renorm': True}, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, mode: str = 'classification', n_classes: int = 2, batch_size: int = 100, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks
n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78 if chirality is used by WeaveFeaturizer.
n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.
n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)
n_weave (int, optional (default 2)) – The number of weave layers in this model.
fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of this list determines the number of layers.
conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight initialization of each convolutional layer. The length of this lisst should equal n_weave. Alternatively, this may be a single value instead of a list, in which case the same value is used for each layer.
weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight initialization of each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float (default 0.0)) – The magnitude of the weight decay penalty to use
weight_decay_penalty_type (str (default "l2")) – The type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
final_conv_activation_fn (Optional[ActivationFn] (default tf.nn.tanh)) – The Tensorflow activation funcntion to apply to the final convolution at the end of the weave convolutions. If None, then no activate is applied (hence linear).
activation_fns (list or object (default tf.nn.relu)) – The Tensorflow activation function to apply to each fully connected layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional and fully connected layers.
batch_normalize_kwargs (Dict, optional (default {“renorm”=True, “fused”: False})) – Batch normalization is a complex layer which has many potential argumentswhich change behavior. This layer accepts userdefined parameters which are passed to all BatchNormalization layers in WeaveModel, WeaveLayer, and WeaveGather.
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input.
mode (str (default "classification")) – Either “classification” or “regression” for type of model.
n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)
batch_size (int (default 100)) – Batch size used by this model for training.

compute_features_on_batch
(X_b)[source]¶ Compute tensors that will be input into the model from featurized representation.
The featurized input to WeaveModel is instances of WeaveMol created by WeaveFeaturizer. This method converts input WeaveMol objects into tensors used by the Keras implementation to compute WeaveModel outputs.
 Parameters
X_b (np.ndarray) – A numpy array with dtype=object where elements are WeaveMol objects.
 Returns
atom_feat (np.ndarray) – Of shape (N_atoms, N_atom_feat).
pair_feat (np.ndarray) – Of shape (N_pairs, N_pair_feat). Note that N_pairs will depend on the number of pairs being considered. If max_pair_distance is None, then this will be N_atoms**2. Else it will be the number of pairs within the specifed graph distance.
pair_split (np.ndarray) – Of shape (N_pairs,). The ith entry in this array will tell you the originating atom for this pair (the “source”). Note that pairs are symmetric so for a pair (a, b), both a and b will separately be sources at different points in this array.
atom_split (np.ndarray) – Of shape (N_atoms,). The ith entry in this array will be the molecule with the ith atom belongs to.
atom_to_pair (np.ndarray) – Of shape (N_pairs, 2). The ith row in this array will be the array [a, b] if (a, b) is a pair to be considered. (Note by symmetry, this implies some other row will contain [b, a].

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Convert a dataset into the tensors needed for learning.
 Parameters
dataset (dc.data.Dataset) – Dataset to convert
epochs (int, optional (Default 1)) – Number of times to walk over dataset
mode (str, optional (Default 'fit')) – Ignored in this implementation.
deterministic (bool, optional (Default True)) – Whether the dataset should be walked in a deterministic fashion
pad_batches (bool, optional (Default True)) – If true, each returned batch will have size self.batch_size.
 Returns
 Return type
Iterator which walks over the batches
DTNNModel¶

class
DTNNModel
(n_tasks, n_embedding=30, n_hidden=100, n_distance=100, distance_min= 1, distance_max=18, output_activation=True, mode='regression', dropout=0.0, **kwargs)[source]¶ Deep Tensor Neural Networks
This class implements deep tensor neural networks as first defined in [1]_
References
 1
Schütt, Kristof T., et al. “Quantumchemical insights from deep
tensor neural networks.” Nature communications 8.1 (2017): 18.

__init__
(n_tasks, n_embedding=30, n_hidden=100, n_distance=100, distance_min= 1, distance_max=18, output_activation=True, mode='regression', dropout=0.0, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks
n_embedding (int, optional) – Number of features per atom.
n_hidden (int, optional) – Number of features for each molecule after DTNNStep
n_distance (int, optional) – granularity of distance matrix step size will be (distance_maxdistance_min)/n_distance
distance_min (float, optional) – minimum distance of atom pairs, default = 1 Angstorm
distance_max (float, optional) – maximum distance of atom pairs, default = 18 Angstorm
mode (str) – Only “regression” is currently supported.
dropout (float) – the dropout probablity to use.

compute_features_on_batch
(X_b)[source]¶ Computes the values for different Feature Layers on given batch
A tf.py_func wrapper is written around this when creating the input_fn for tf.Estimator

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
DAGModel¶

class
DAGModel
(n_tasks, max_atoms=50, n_atom_feat=75, n_graph_feat=30, n_outputs=30, layer_sizes=[100], layer_sizes_gather=[100], dropout=None, mode='classification', n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]¶ Directed Acyclic Graph models for molecular property prediction.
This model is based on the following paper:
Lusci, Alessandro, Gianluca Pollastri, and Pierre Baldi. “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for druglike molecules.” Journal of chemical information and modeling 53.7 (2013): 15631575.
The basic idea for this paper is that a molecule is usually viewed as an undirected graph. However, you can convert it to a series of directed graphs. The idea is that for each atom, you make a DAG using that atom as the vertex of the DAG and edges pointing “inwards” to it. This transformation is implemented in dc.trans.transformers.DAGTransformer.UG_to_DAG.
This model accepts ConvMols as input, just as GraphConvModel does, but these ConvMol objects must be transformed by dc.trans.DAGTransformer.
As a note, performance of this model can be a little sensitive to initialization. It might be worth training a few different instantiations to get a stable set of parameters.

__init__
(n_tasks, max_atoms=50, n_atom_feat=75, n_graph_feat=30, n_outputs=30, layer_sizes=[100], layer_sizes_gather=[100], dropout=None, mode='classification', n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
max_atoms (int, optional) – Maximum number of atoms in a molecule, should be defined based on dataset.
n_atom_feat (int, optional) – Number of features per atom.
n_graph_feat (int, optional) – Number of features for atom in the graph.
n_outputs (int, optional) – Number of features for each molecule.
layer_sizes (list of int, optional) – List of hidden layer size(s) in the propagation step: length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.
layer_sizes_gather (list of int, optional) – List of hidden layer size(s) in the gather step.
dropout (None or float, optional) – Dropout probability, applied after each propagation step and gather step.
mode (str, optional) – Either “classification” or “regression” for type of model.
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

GraphConvModel¶

class
GraphConvModel
(n_tasks: int, graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]¶ Graph Convolutional Models.
This class implements the graph convolutional model from the following paper [1]_. These graph convolutions start with a peratom set of descriptors for each atom in a molecule, then combine and recombine these descriptors over convolutional layers. following [1]_.
References
 1
Duvenaud, David K., et al. “Convolutional networks on graphs for
learning molecular fingerprints.” Advances in neural information processing systems. 2015.

__init__
(n_tasks: int, graph_conv_layers: List[int] = [64, 64], dense_layer_size: int = 128, dropout: float = 0.0, mode: str = 'classification', number_atom_features: int = 75, n_classes: int = 2, batch_size: int = 100, batch_normalize: bool = True, uncertainty: bool = False, **kwargs)[source]¶ The wrapper class for graph convolutions.
Note that since the underlying _GraphConvKerasModel class is specified using imperative subclassing style, this model cannout make predictions for arbitrary outputs.
 Parameters
n_tasks (int) – Number of tasks
graph_conv_layers (list of int) – Width of channels for the Graph Convolution Layers
dense_layer_size (int) – Width of channels for Atom Level Dense Layer before GraphPool
dropout (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(graph_conv_layers)+1 (one value for each convolution layer, and one for the dense layer). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
mode (str) – Either “classification” or “regression”
number_atom_features (int) – 75 is the default number of atom features created, but this can vary if various options are passed to the function atom_features in graph_features
n_classes (int) – the number of classes to predict (only used in classification mode)
batch_normalize (True) – if True, apply batch normalization to model
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
MPNNModel¶

class
MPNNModel
(n_tasks, n_atom_feat=70, n_pair_feat=8, n_hidden=100, T=5, M=10, mode='regression', dropout=0.0, n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]¶ Message Passing Neural Network,
Message Passing Neural Networks treat graph convolutional operations as an instantiation of a more general message passing schem. Recall that message passing in a graph is when nodes in a graph send each other “messages” and update their internal state as a consequence of these messages.
Ordering structures in this model are built according to [1]_
References
 1
Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. “Order matters:
Sequence to sequence for sets.” arXiv preprint arXiv:1511.06391 (2015).

__init__
(n_tasks, n_atom_feat=70, n_pair_feat=8, n_hidden=100, T=5, M=10, mode='regression', dropout=0.0, n_classes=2, uncertainty=False, batch_size=100, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks
n_atom_feat (int, optional) – Number of features per atom.
n_pair_feat (int, optional) – Number of features per pair of atoms.
n_hidden (int, optional) – Number of units(convolution depths) in corresponding hidden layer
n_graph_feat (int, optional) – Number of output features for each molecule(graph)
dropout (float) – the dropout probablity to use.
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
BasicMolGANModel¶

class
BasicMolGANModel
(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, **kwargs)[source]¶ Model for denovo generation of small molecules based on work of Nicola De Cao et al. [1]_. Utilizes WGAN infrastructure; uses adjacency matrix and node features as inputs. Inputs need to be onehot representation.
Examples
>>> >> import deepchem as dc >> from deepchem.models import BasicMolGANModel as MolGAN >> from deepchem.models.optimizers import ExponentialDecay >> from tensorflow import one_hot >> smiles = ['CCC', 'C1=CC=CC=C1', 'CNC' ] >> # create featurizer >> feat = dc.feat.MolGanFeaturizer() >> # featurize molecules >> features = feat.featurize(smiles) >> # Remove empty objects >> features = list(filter(lambda x: x is not None, features)) >> # create model >> gan = MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000)) >> dataset = dc.data.NumpyDataset([x.adjacency_matrix for x in features],[x.node_features for x in features]) >> def iterbatches(epochs): >> for i in range(epochs): >> for batch in dataset.iterbatches(batch_size=gan.batch_size, pad_batches=True): >> adjacency_tensor = one_hot(batch[0], gan.edges) >> node_tensor = one_hot(batch[1], gan.nodes) >> yield {gan.data_inputs[0]: adjacency_tensor, gan.data_inputs[1]:node_tensor} >> gan.fit_gan(iterbatches(8), generator_steps=0.2, checkpoint_interval=5000) >> generated_data = gan.predict_gan_generator(1000) >> # convert graphs to RDKitmolecules >> nmols = feat.defeaturize(generated_data) >> print("{} molecules generated".format(len(nmols))) >> # remove invalid moles >> nmols = list(filter(lambda x: x is not None, nmols)) >> # currently training is unstable so 0 is a common outcome >> print ("{} valid molecules".format(len(nmols)))
References
 1
Nicola De Cao et al. “MolGAN: An implicit generative model
for small molecular graphs”, https://arxiv.org/abs/1805.11973

__init__
(edges: int = 5, vertices: int = 9, nodes: int = 5, embedding_dim: int = 10, dropout_rate: float = 0.0, **kwargs)[source]¶ Initialize the model
 Parameters
edges (int, default 5) – Number of bond types includes BondType.Zero
vertices (int, default 9) – Max number of atoms in adjacency and node features matrices
nodes (int, default 5) – Number of atom types in node features matrix
embedding_dim (int, default 10) – Size of noise input array
dropout_rate (float, default = 0.) – Rate of dropout used across whole model
name (str, default '') – Name of the model

get_noise_input_shape
() → Tuple[int][source]¶ Return shape of the noise input used in generator
 Returns
Shape of the noise input
 Return type
Tuple

get_data_input_shapes
() → List[source]¶ Return input shape of the discriminator
 Returns
List of shapes used as an input for distriminator.
 Return type
List

create_generator
() → tensorflow.python.keras.engine.training.Model[source]¶ Create generator model. Take noise data as an input and processes it through number of dense and dropout layers. Then data is converted into two forms one used for training and other for generation of compounds. The model has two outputs:
edges
nodes
The format differs depending on intended use (training or sample generation). For sample generation use flag, sample_generation=True while calling generator i.e. gan.generators[0](noise_input, training=False, sample_generation=True). In case of training, not flag is necessary.

create_discriminator
() → tensorflow.python.keras.engine.training.Model[source]¶ Create discriminator model based on MolGAN layers. Takes two inputs:
adjacency tensor, containing bond information
nodes tensor, containing atom information
The input vectors need to be in onehot encoding format. Use MolGAN featurizer for that purpose. It will be simplified in the future release.

predict_gan_generator
(batch_size: int = 1, noise_input: Optional[List] = None, conditional_inputs: List = [], generator_index: int = 0) → List[deepchem.feat.molecule_featurizers.molgan_featurizer.GraphMatrix][source]¶ Use the GAN to generate a batch of samples.
 Parameters
batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.
conditional_inputs (list of arrays) – NOT USED. the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.
generator_index (int) – NOT USED. the index of the generator (between 0 and n_generators1) to use for generating the samples.
 Returns
Returns a list of GraphMatrix object that can be converted into RDKit molecules using MolGANFeaturizer defeaturize function.
 Return type
List[GraphMatrix]
ScScoreModel¶

class
ScScoreModel
(n_features, layer_sizes=[300, 300, 300], dropouts=0.0, **kwargs)[source]¶ https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00622 Several definitions of molecular complexity exist to facilitate prioritization of lead compounds, to identify diversityinducing and complexifying reactions, and to guide retrosynthetic searches. In this work, we focus on synthetic complexity and reformalize its definition to correlate with the expected number of reaction steps required to produce a target molecule, with implicit knowledge about what compounds are reasonable starting materials. We train a neural network model on 12 million reactions from the Reaxys database to impose a pairwise inequality constraint enforcing the premise of this definition: that on average, the products of published chemical reactions should be more synthetically complex than their corresponding reactants. The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.
Our model here actually uses hingeloss instead of the shifted relu loss in https://github.com/connorcoley/scscore.
This could cause issues differentiation issues with compounds that are “close” to each other in “complexity”

__init__
(n_features, layer_sizes=[300, 300, 300], dropouts=0.0, **kwargs)[source]¶  Parameters
n_features (int) – number of features per molecule
layer_sizes (list of int) – size of each hidden layer
dropouts (int) – droupout to apply to each hidden layer
kwargs – This takes all kwards as TensorGraph

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

SeqToSeq¶

class
SeqToSeq
(input_tokens, output_tokens, max_output_length, encoder_layers=4, decoder_layers=4, embedding_dimension=512, dropout=0.0, reverse_input=True, variational=False, annealing_start_step=5000, annealing_final_step=10000, **kwargs)[source]¶ Implements sequence to sequence translation models.
The model is based on the description in Sutskever et al., “Sequence to Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215), although this implementation uses GRUs instead of LSTMs. The goal is to take sequences of tokens as input, and translate each one into a different output sequence. The input and output sequences can both be of variable length, and an output sequence need not have the same length as the input sequence it was generated from. For example, these models were originally developed for use in natural language processing. In that context, the input might be a sequence of English words, and the output might be a sequence of French words. The goal would be to train the model to translate sentences from English to French.
The model consists of two parts called the “encoder” and “decoder”. Each one consists of a stack of recurrent layers. The job of the encoder is to transform the input sequence into a single, fixed length vector called the “embedding”. That vector contains all relevant information from the input sequence. The decoder then transforms the embedding vector into the output sequence.
These models can be used for various purposes. First and most obviously, they can be used for sequence to sequence translation. In any case where you have sequences of tokens, and you want to translate each one into a different sequence, a SeqToSeq model can be trained to perform the translation.
Another possible use case is transforming variable length sequences into fixed length vectors. Many types of models require their inputs to have a fixed shape, which makes it difficult to use them with variable sized inputs (for example, when the input is a molecule, and different molecules have different numbers of atoms). In that case, you can train a SeqToSeq model as an autoencoder, so that it tries to make the output sequence identical to the input one. That forces the embedding vector to contain all information from the original sequence. You can then use the encoder for transforming sequences into fixed length embedding vectors, suitable to use as inputs to other types of models.
Another use case is to train the decoder for use as a generative model. Here again you begin by training the SeqToSeq model as an autoencoder. Once training is complete, you can supply arbitrary embedding vectors, and transform each one into an output sequence. When used in this way, you typically train it as a variational autoencoder. This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. You can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.
When training as a variational autoencoder, it is best to use KL cost annealing, as described in https://arxiv.org/abs/1511.06349. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configurable.

__init__
(input_tokens, output_tokens, max_output_length, encoder_layers=4, decoder_layers=4, embedding_dimension=512, dropout=0.0, reverse_input=True, variational=False, annealing_start_step=5000, annealing_final_step=10000, **kwargs)[source]¶ Construct a SeqToSeq model.
In addition to the following arguments, this class also accepts all the keyword arguments from TensorGraph.
 Parameters
input_tokens (list) – a list of all tokens that may appear in input sequences
output_tokens (list) – a list of all tokens that may appear in output sequences
max_output_length (int) – the maximum length of output sequence that may be generated
encoder_layers (int) – the number of recurrent layers in the encoder
decoder_layers (int) – the number of recurrent layers in the decoder
embedding_dimension (int) – the width of the embedding vector. This also is the width of all recurrent layers.
dropout (float) – the dropout probability to use during training
reverse_input (bool) – if True, reverse the order of input sequences before sending them into the encoder. This can improve performance when working with long sequences.
variational (bool) – if True, train the model as a variational autoencoder. This adds random noise to the encoder, and also constrains the embedding to follow a unit Gaussian distribution.
annealing_start_step (int) – the step (that is, batch) at which to begin turning on the constraint term for KL cost annealing
annealing_final_step (int) – the step (that is, batch) at which to finish turning on the constraint term for KL cost annealing

fit_sequences
(sequences, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]¶ Train this model on a set of sequences
 Parameters
sequences (iterable) – the training samples to fit to. Each sample should be represented as a tuple of the form (input_sequence, output_sequence).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.

predict_from_sequences
(sequences, beam_width=5)[source]¶ Given a set of input sequences, predict the output sequences.
The prediction is done using a beam search with length normalization.
 Parameters
sequences (iterable) – the input sequences to generate a prediction for
beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.

predict_from_embeddings
(embeddings, beam_width=5)[source]¶ Given a set of embedding vectors, predict the output sequences.
The prediction is done using a beam search with length normalization.
 Parameters
embeddings (iterable) – the embedding vectors to generate predictions for
beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.

GAN¶

class
GAN
(n_generators=1, n_discriminators=1, **kwargs)[source]¶ Implements Generative Adversarial Networks.
A Generative Adversarial Network (GAN) is a type of generative model. It consists of two parts called the “generator” and the “discriminator”. The generator takes random noise as input and transforms it into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.
In many cases there also are additional inputs to the generator and discriminator. In that case it is known as a Conditional GAN (CGAN), since it learns a distribution that is conditional on the values of those inputs. They are referred to as “conditional inputs”.
Many variations on this idea have been proposed, and new varieties of GANs are constantly being proposed. This class tries to make it very easy to implement straightforward GANs of the most conventional types. At the same time, it tries to be flexible enough that it can be used to implement many (but certainly not all) variations on the concept.
To define a GAN, you must create a subclass that provides implementations of the following methods:
get_noise_input_shape() get_data_input_shapes() create_generator() create_discriminator()
If you want your GAN to have any conditional inputs you must also implement:
get_conditional_input_shapes()
The following methods have default implementations that are suitable for most conventional GANs. You can override them if you want to customize their behavior:
create_generator_loss() create_discriminator_loss() get_noise_batch()
This class allows a GAN to have multiple generators and discriminators, a model known as MIX+GAN. It is described in Arora et al., “Generalization and Equilibrium in Generative Adversarial Nets (GANs)” (https://arxiv.org/abs/1703.00573). This can lead to better models, and is especially useful for reducing mode collapse, since different generators can learn different parts of the distribution. To use this technique, simply specify the number of generators and discriminators when calling the constructor. You can then tell predict_gan_generator() which generator to use for predicting samples.

__init__
(n_generators=1, n_discriminators=1, **kwargs)[source]¶ Construct a GAN.
In addition to the parameters listed below, this class accepts all the keyword arguments from KerasModel.
 Parameters
n_generators (int) – the number of generators to include
n_discriminators (int) – the number of discriminators to include

get_noise_input_shape
()[source]¶ Get the shape of the generator’s noise input layer.
Subclasses must override this to return a tuple giving the shape of the noise input. The actual Input layer will be created automatically. The dimension corresponding to the batch size should be omitted.

get_data_input_shapes
()[source]¶ Get the shapes of the inputs for training data.
Subclasses must override this to return a list of tuples, each giving the shape of one of the inputs. The actual Input layers will be created automatically. This list of shapes must also match the shapes of the generator’s outputs. The dimension corresponding to the batch size should be omitted.

get_conditional_input_shapes
()[source]¶ Get the shapes of any conditional inputs.
Subclasses may override this to return a list of tuples, each giving the shape of one of the conditional inputs. The actual Input layers will be created automatically. The dimension corresponding to the batch size should be omitted.
The default implementation returns an empty list, meaning there are no conditional inputs.

get_noise_batch
(batch_size)[source]¶ Get a batch of random noise to pass to the generator.
This should return a NumPy array whose shape matches the one returned by get_noise_input_shape(). The default implementation returns normally distributed values. Subclasses can override this to implement a different distribution.

create_generator
()[source]¶ Create and return a generator.
Subclasses must override this to construct the generator. The returned value should be a tf.keras.Model whose inputs are a batch of noise, followed by any conditional inputs. The number and shapes of its outputs must match the return value from get_data_input_shapes(), since generated data must have the same form as training data.

create_discriminator
()[source]¶ Create and return a discriminator.
Subclasses must override this to construct the discriminator. The returned value should be a tf.keras.Model whose inputs are all data inputs, followed by any conditional inputs. Its output should be a one dimensional tensor containing the probability of each sample being a training sample.

create_generator_loss
(discrim_output)[source]¶ Create the loss function for the generator.
The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.
 Parameters
discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
 Returns
 Return type
A Tensor equal to the loss function to use for optimizing the generator.

create_discriminator_loss
(discrim_output_train, discrim_output_gen)[source]¶ Create the loss function for the discriminator.
The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.
 Parameters
discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
 Returns
 Return type
A Tensor equal to the loss function to use for optimizing the discriminator.

fit_gan
(batches, generator_steps=1.0, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]¶ Train this model on data.
 Parameters
batches (iterable) – batches of data to train the discriminator on, each represented as a dict that maps Inputs to values. It should specify values for all members of data_inputs and conditional_inputs.
generator_steps (float) – the number of training steps to perform for the generator for each batch. This can be used to adjust the ratio of training steps for the generator and discriminator. For example, 2.0 will perform two training steps for every batch, while 0.5 will only perform one training step for every two batches.
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint before training it.

predict_gan_generator
(batch_size=1, noise_input=None, conditional_inputs=[], generator_index=0)[source]¶ Use the GAN to generate a batch of samples.
 Parameters
batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.
conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.
generator_index (int) – the index of the generator (between 0 and n_generators1) to use for generating the samples.
 Returns
An array (if the generator has only one output) or list of arrays (if it has
multiple outputs) containing the generated samples.

WGAN¶

class
WGAN
(gradient_penalty=10.0, **kwargs)[source]¶ Implements Wasserstein Generative Adversarial Networks.
This class implements Wasserstein Generative Adversarial Networks (WGANs) as described in Arjovsky et al., “Wasserstein GAN” (https://arxiv.org/abs/1701.07875). A WGAN is conceptually rather different from a conventional GAN, but in practical terms very similar. It reinterprets the discriminator (often called the “critic” in this context) as learning an approximation to the Earth Mover distance between the training and generated distributions. The generator is then trained to minimize that distance. In practice, this just means using slightly different loss functions for training the generator and discriminator.
WGANs have theoretical advantages over conventional GANs, and they often work better in practice. In addition, the discriminator’s loss function can be directly interpreted as a measure of the quality of the model. That is an advantage over conventional GANs, where the loss does not directly convey information about the quality of the model.
The theory WGANs are based on requires the discriminator’s gradient to be bounded. The original paper achieved this by clipping its weights. This class instead does it by adding a penalty term to the discriminator’s loss, as described in https://arxiv.org/abs/1704.00028. This is sometimes found to produce better results.
There are a few other practical differences between GANs and WGANs. In a conventional GAN, the discriminator’s output must be between 0 and 1 so it can be interpreted as a probability. In a WGAN, it should produce an unbounded output that can be interpreted as a distance.
When training a WGAN, you also should usually use a smaller value for generator_steps. Conventional GANs rely on keeping the generator and discriminator “in balance” with each other. If the discriminator ever gets too good, it becomes impossible for the generator to fool it and training stalls. WGANs do not have this problem, and in fact the better the discriminator is, the easier it is for the generator to improve. It therefore usually works best to perform several training steps on the discriminator for each training step on the generator.

__init__
(gradient_penalty=10.0, **kwargs)[source]¶ Construct a WGAN.
In addition to the following, this class accepts all the keyword arguments from GAN and KerasModel.
 Parameters
gradient_penalty (float) – the magnitude of the gradient penalty loss

create_generator_loss
(discrim_output)[source]¶ Create the loss function for the generator.
The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.
 Parameters
discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
 Returns
 Return type
A Tensor equal to the loss function to use for optimizing the generator.

create_discriminator_loss
(discrim_output_train, discrim_output_gen)[source]¶ Create the loss function for the discriminator.
The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.
 Parameters
discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
 Returns
 Return type
A Tensor equal to the loss function to use for optimizing the discriminator.

CNN¶

class
CNN
(n_tasks, n_features, dims, layer_filters=[100], kernel_size=5, strides=1, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, dense_layer_size=1000, pool_type='max', mode='classification', n_classes=2, uncertainty=False, residual=False, padding='valid', **kwargs)[source]¶ A 1, 2, or 3 dimensional convolutional network for either regression or classification.
The network consists of the following sequence of layers:
A configurable number of convolutional layers
A global pooling layer (either max pool or average pool)
A final dense layer to compute the output
It optionally can compose the model from preactivation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of convolution layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same output shape. Wherever the output shape changes, a simple convolution layer will be used even if residual=True.

__init__
(n_tasks, n_features, dims, layer_filters=[100], kernel_size=5, strides=1, weight_init_stddevs=0.02, bias_init_consts=1.0, weight_decay_penalty=0.0, weight_decay_penalty_type='l2', dropouts=0.5, activation_fns=<function relu>, dense_layer_size=1000, pool_type='max', mode='classification', n_classes=2, uncertainty=False, residual=False, padding='valid', **kwargs)[source]¶ Create a CNN.
In addition to the following arguments, this class also accepts all the keyword arguments from TensorGraph.
 Parameters
n_tasks (int) – number of tasks
n_features (int) – number of features
dims (int) – the number of dimensions to apply convolutions over (1, 2, or 3)
layer_filters (list) – the number of output filters for each convolutional layer in the network. The length of this list determines the number of layers.
kernel_size (int, tuple, or list) – a list giving the shape of the convolutional kernel for each layer. Each element may be either an int (use the same kernel width for every dimension) or a tuple (the kernel width along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same kernel shape is used for every layer.
strides (int, tuple, or list) – a list giving the stride between applications of the kernel for each layer. Each element may be either an int (use the same stride for every dimension) or a tuple (the stride along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same stride is used for every layer.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
pool_type (str) – the type of pooling layer to use, either ‘max’ or ‘average’
mode (str) – Either ‘classification’ or ‘regression’
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted
residual (bool) – if True, the model will be composed of preactivation residual blocks instead of a simple stack of convolutional layers.
padding (str) – the type of padding to use for convolutional layers, either ‘valid’ or ‘same’

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])
TextCNNModel¶

class
TextCNNModel
(n_tasks, char_dict, seq_length, n_embedding=75, kernel_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters=[100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout=0.25, mode='classification', **kwargs)[source]¶ A Convolutional neural network on smiles strings
Reimplementation of the discriminator module in ORGAN [1]_ . Originated from 2.
This model applies multiple 1D convolutional filters to the padded strings, then maxovertime pooling is applied on all filters, extracting one feature per filter. All features are concatenated and transformed through several hidden layers to form predictions.
This model is initially developed for sentencelevel classification tasks, with words represented as vectors. In this implementation, SMILES strings are dissected into characters and transformed to onehot vectors in a similar way. The model can be used for general molecularlevel classification or regression tasks. It is also used in the ORGAN model as discriminator.
Training of the model only requires SMILES strings input, all featurized datasets that include SMILES in the ids attribute are accepted. PDBbind, QM7 and QM7b are not supported. To use the model, build_char_dict should be called first before defining the model to build character dict of input dataset, example can be found in examples/delaney/delaney_textcnn.py
References
 1
Guimaraes, Gabriel Lima, et al. “Objectivereinforced generative adversarial networks (ORGAN) for sequence generation models.” arXiv preprint arXiv:1705.10843 (2017).
 2
Kim, Yoon. “Convolutional neural networks for sentence classification.” arXiv preprint arXiv:1408.5882 (2014).

__init__
(n_tasks, char_dict, seq_length, n_embedding=75, kernel_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters=[100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout=0.25, mode='classification', **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks
char_dict (dict) – Mapping from characters in smiles to integers
seq_length (int) – Length of sequences(after padding)
n_embedding (int, optional) – Length of embedding vector
filter_sizes (list of int, optional) – Properties of filters used in the conv net
num_filters (list of int, optional) – Properties of filters used in the conv net
dropout (float, optional) – Dropout rate
mode (str) – Either “classification” or “regression” for type of model.

static
build_char_dict
(dataset, default_dict={'#': 1, '(': 2, ')': 3, '+': 4, '': 5, '/': 6, '1': 7, '2': 8, '3': 9, '4': 10, '5': 11, '6': 12, '7': 13, '8': 14, '=': 15, 'Br': 30, 'C': 16, 'Cl': 29, 'F': 17, 'H': 18, 'I': 19, 'N': 20, 'O': 21, 'P': 22, 'S': 23, '[': 24, '\\': 25, ']': 26, '_': 27, 'c': 28, 'n': 31, 'o': 32, 's': 33})[source]¶ Collect all unique characters(in smiles) from the dataset. This method should be called before defining the model to build appropriate char_dict

smiles_to_seq_batch
(ids_b)[source]¶ Converts SMILES strings to np.array sequence.
A tf.py_func wrapper is written around this when creating the input_fn for make_estimator
AtomicConvModel¶

class
AtomicConvModel
(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, 1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = <function relu>, residual: bool = False, learning_rate=0.001, **kwargs)[source]¶ Implements an Atomic Convolution Model.
Implements the atomic convolutional networks as introduced in
Gomes, Joseph, et al. “Atomic convolutional networks for predicting proteinligand binding affinity.” arXiv preprint arXiv:1703.10603 (2017).
The atomic convolutional networks function as a variant of graph convolutions. The difference is that the “graph” here is the nearest neighbors graph in 3D space. The AtomicConvModel leverages these connections in 3D space to train models that learn to predict energetic state starting from the spatial geometry of the model.

__init__
(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, 1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: Union[float, Sequence[float]] = 0.02, bias_init_consts: Union[float, Sequence[float]] = 1.0, weight_decay_penalty: float = 0.0, weight_decay_penalty_type: str = 'l2', dropouts: Union[float, Sequence[float]] = 0.5, activation_fns: Union[Callable, str, Sequence[Union[Callable, str]]] = <function relu>, residual: bool = False, learning_rate=0.001, **kwargs) → None[source]¶  Parameters
n_tasks (int) – number of tasks
frag1_num_atoms (int) – Number of atoms in first fragment
frag2_num_atoms (int) – Number of atoms in sec
max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors are spatial neighbors.
atom_types (list) – List of atoms recognized by model. Atoms are indicated by their nuclear numbers.
radial (list) – Radial parameters used in the atomic convolution transformation.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
residual (bool) – if True, the model will be composed of preactivation residual blocks instead of a simple stack of dense layers.
learning_rate (float) – Learning rate for the model.

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

Smiles2Vec¶

class
Smiles2Vec
(char_to_idx, n_tasks=10, max_seq_len=270, embedding_dim=50, n_classes=2, use_bidir=True, use_conv=True, filters=192, kernel_size=3, strides=1, rnn_sizes=[224, 384], rnn_types=['GRU', 'GRU'], mode='regression', **kwargs)[source]¶ Implements the Smiles2Vec model, that learns neural representations of SMILES strings which can be used for downstream tasks.
The model is based on the description in Goh et al., “SMILES2vec: An Interpretable GeneralPurpose Deep Neural Network for Predicting Chemical Properties” (https://arxiv.org/pdf/1712.02034.pdf). The goal here is to take SMILES strings as inputs, turn them into vector representations which can then be used in predicting molecular properties.
The model consists of an Embedding layer that retrieves embeddings for each character in the SMILES string. These embeddings are learnt jointly with the rest of the model. The output from the embedding layer is a tensor of shape (batch_size, seq_len, embedding_dim). This tensor can optionally be fed through a 1D convolutional layer, before being passed to a series of RNN cells (optionally bidirectional). The final output from the RNN cells aims to have learnt the temporal dependencies in the SMILES string, and in turn information about the structure of the molecule, which is then used for molecular property prediction.
In the paper, the authors also train an explanation mask to endow the model with interpretability and gain insights into its decision making. This segment is currently not a part of this implementation as this was developed for the purpose of investigating a transfer learning protocol, ChemNet (which can be found at https://arxiv.org/abs/1712.02734).

__init__
(char_to_idx, n_tasks=10, max_seq_len=270, embedding_dim=50, n_classes=2, use_bidir=True, use_conv=True, filters=192, kernel_size=3, strides=1, rnn_sizes=[224, 384], rnn_types=['GRU', 'GRU'], mode='regression', **kwargs)[source]¶  Parameters
char_to_idx (dict,) – char_to_idx contains character to index mapping for SMILES characters
embedding_dim (int, default 50) – Size of character embeddings used.
use_bidir (bool, default True) – Whether to use BiDirectional RNN Cells
use_conv (bool, default True) – Whether to use a convlayer
kernel_size (int, default 3) – Kernel size for convolutions
filters (int, default 192) – Number of filters
strides (int, default 1) – Strides used in convolution
rnn_sizes (list[int], default [224, 384]) – Number of hidden units in the RNN cells
mode (str, default regression) – Whether to use model for regression or classification

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

ChemCeption¶

class
ChemCeption
(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: str = 'regression', **kwargs)[source]¶ Implements the ChemCeption model that leverages the representational capacities of convolutional neural networks (CNNs) to predict molecular properties.
The model is based on the description in Goh et al., “Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expertdeveloped QSAR/QSPR Models” (https://arxiv.org/pdf/1706.06689.pdf). The authors use an image based representation of the molecule, where pixels encode different atomic and bond properties. More details on the image repres entations can be found at https://arxiv.org/abs/1710.02238
The model consists of a Stem Layer that reduces the image resolution for the layers to follow. The output of the Stem Layer is followed by a series of InceptionResnet blocks & a Reduction layer. Layers in the InceptionResnet blocks process image tensors at multiple resolutions and use a ResNet style skipconnection, combining features from different resolutions. The Reduction layers reduce the spatial extent of the image by maxpooling and 2strided convolutions. More details on these layers can be found in the ChemCeption paper referenced above. The output of the final Reduction layer is subject to a Global Average Pooling, and a fullyconnected layer maps the features to downstream outputs.
In the ChemCeption paper, the authors perform realtime image augmentation by rotating images between 0 to 180 degrees. This can be done during model training by setting the augment argument to True.

__init__
(img_spec: str = 'std', img_size: int = 80, base_filters: int = 16, inception_blocks: Dict = {'A': 3, 'B': 3, 'C': 3}, n_tasks: int = 10, n_classes: int = 2, augment: bool = False, mode: str = 'regression', **kwargs)[source]¶  Parameters
img_spec (str, default std) – Image specification used
img_size (int, default 80) – Image size used
base_filters (int, default 16) – Base filters used for the different inception and reduction layers
inception_blocks (dict,) – Dictionary containing number of blocks for every inception layer
n_tasks (int, default 10) – Number of classification or regression tasks
n_classes (int, default 2) – Number of classes (used only for classification)
augment (bool, default False) – Whether to augment images
mode (str, default regression) – Whether the model is used for regression or classification

build_inception_module
(inputs, type='A')[source]¶ Inception module is a series of inception layers of similar type. This function builds that.

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

NormalizingFlowModel¶
The purpose of a normalizing flow is to map a simple distribution (that is easy to sample from and evaluate probability densities for) to a more complex distribution that is learned from data. Normalizing flows combine the advantages of autoregressive models (which provide likelihood estimation but do not learn features) and variational autoencoders (which learn feature representations but do not provide marginal likelihoods). They are effective for any application requiring a probabilistic model with these capabilities, e.g. generative modeling, unsupervised learning, or probabilistic inference.

class
NormalizingFlowModel
(model: deepchem.models.normalizing_flows.NormalizingFlow, **kwargs)[source]¶ A base distribution and normalizing flow for applying transformations.
Normalizing flows are effective for any application requiring a probabilistic model that can both sample from a distribution and compute marginal likelihoods, e.g. generative modeling, unsupervised learning, or probabilistic inference. For a thorough review of normalizing flows, see [1]_.
 A distribution implements two main operations:
Sampling from the transformed distribution
Calculating log probabilities
 A normalizing flow implements three main operations:
Forward transformation
Inverse transformation
Calculating the Jacobian
Deep Normalizing Flow models require normalizing flow layers where input and output dimensions are the same, the transformation is invertible, and the determinant of the Jacobian is efficient to compute and differentiable. The determinant of the Jacobian of the transformation gives a factor that preserves the probability volume to 1 when transforming between probability densities of different random variables.
References
 1
Papamakarios, George et al. “Normalizing Flows for Probabilistic Modeling and Inference.” (2019). https://arxiv.org/abs/1912.02762.

__init__
(model: deepchem.models.normalizing_flows.NormalizingFlow, **kwargs) → None[source]¶ Creates a new NormalizingFlowModel.
In addition to the following arguments, this class also accepts all the keyword arguments from KerasModel.
 Parameters
model (NormalizingFlow) – An instance of NormalizingFlow.
Examples
>> import tensorflow_probability as tfp >> tfd = tfp.distributions >> tfb = tfp.bijectors >> flow_layers = [ .. tfb.RealNVP( .. num_masked=2, .. shift_and_log_scale_fn=tfb.real_nvp_default_template( .. hidden_layers=[8, 8])) ..] >> base_distribution = tfd.MultivariateNormalDiag(loc=[0., 0., 0.]) >> nf = NormalizingFlow(base_distribution, flow_layers) >> nfm = NormalizingFlowModel(nf) >> dataset = NumpyDataset( .. X=np.random.rand(5, 3).astype(np.float32), .. y=np.random.rand(5,), .. ids=np.arange(5)) >> nfm.fit(dataset)

create_nll
(input: Union[tensorflow.python.framework.ops.Tensor, Sequence[tensorflow.python.framework.ops.Tensor]]) → tensorflow.python.framework.ops.Tensor[source]¶ Create the negative log likelihood loss function.
The default implementation is appropriate for most cases. Subclasses can override this if there is a need to customize it.
 Parameters
input (OneOrMany[tf.Tensor]) – A batch of data.
 Returns
 Return type
A Tensor equal to the loss function to use for optimization.
PyTorch Models¶
DeepChem supports the use of PyTorch to build deep learning models.
TorchModel¶
You can wrap an arbitrary torch.nn.Module
in a TorchModel
object.

class
TorchModel
(model: torch.nn.modules.module.Module, loss: Union[deepchem.models.losses.Loss, Callable[[List, List, List], Any]], output_types: Optional[List[str]] = None, batch_size: int = 100, model_dir: Optional[str] = None, learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, optimizer: Optional[deepchem.models.optimizers.Optimizer] = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, device: Optional[torch.device] = None, regularization_loss: Optional[Callable] = None, **kwargs)[source]¶ This is a DeepChem model implemented by a PyTorch model.
Here is a simple example of code that uses TorchModel to train a PyTorch model on a DeepChem dataset.
>> pytorch_model = torch.nn.Sequential( >> torch.nn.Linear(100, 1000), >> torch.nn.Tanh(), >> torch.nn.Linear(1000, 1)) >> model = TorchModel(pytorch_model, loss=dc.models.losses.L2Loss()) >> model.fit(dataset)
The loss function for a model can be defined in two different ways. For models that have only a single output and use a standard loss function, you can simply provide a dc.models.losses.Loss object. This defines the loss for each sample or sample/task pair. The result is automatically multiplied by the weights and averaged over the batch.
For more complicated cases, you can instead provide a function that directly computes the total loss. It must be of the form f(outputs, labels, weights), taking the list of outputs from the model, the expected values, and any weight matrices. It should return a scalar equal to the value of the loss function for the batch. No additional processing is done to the result; it is up to you to do any weighting, averaging, adding of penalty terms, etc.
You can optionally provide an output_types argument, which describes how to interpret the model’s outputs. This should be a list of strings, one for each output. You can use an arbitrary output_type for a output, but some output_types are special and will undergo extra processing:
‘prediction’: This is a normal output, and will be returned by predict(). If output types are not specified, all outputs are assumed to be of this type.
‘loss’: This output will be used in place of the normal outputs for computing the loss function. For example, models that output probability distributions usually do it by computing unbounded numbers (the logits), then passing them through a softmax function to turn them into probabilities. When computing the cross entropy, it is more numerically stable to use the logits directly rather than the probabilities. You can do this by having the model produce both probabilities and logits as outputs, then specifying output_types=[‘prediction’, ‘loss’]. When predict() is called, only the first output (the probabilities) will be returned. But during training, it is the second output (the logits) that will be passed to the loss function.
‘variance’: This output is used for estimating the uncertainty in another output. To create a model that can estimate uncertainty, there must be the same number of ‘prediction’ and ‘variance’ outputs. Each variance output must have the same shape as the corresponding prediction output, and each element is an estimate of the variance in the corresponding prediction. Also be aware that if a model supports uncertainty, it MUST use dropout on every layer, and dropout most be enabled during uncertainty prediction. Otherwise, the uncertainties it computes will be inaccurate.
other: Arbitrary output_types can be used to extract outputs produced by the model, but will have no additional processing performed.

__init__
(model: torch.nn.modules.module.Module, loss: Union[deepchem.models.losses.Loss, Callable[[List, List, List], Any]], output_types: Optional[List[str]] = None, batch_size: int = 100, model_dir: Optional[str] = None, learning_rate: Union[float, deepchem.models.optimizers.LearningRateSchedule] = 0.001, optimizer: Optional[deepchem.models.optimizers.Optimizer] = None, tensorboard: bool = False, wandb: bool = False, log_frequency: int = 100, device: Optional[torch.device] = None, regularization_loss: Optional[Callable] = None, **kwargs) → None[source]¶ Create a new TorchModel.
 Parameters
model (torch.nn.Module) – the PyTorch model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each batch, as described above
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None, a temporary directory is created.
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is ignored.
optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is ignored.
tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training
wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using logging by default. If tensorboard is set, data is also logged to TensorBoard. If wandb is set, data is also logged to Weights & Biases. Logging happens at global steps. Roughly, a global step corresponds to one batch of training. If you’d like a printout every 10 batch steps, you’d set log_frequency=10 for example.
device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.
regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add to the loss function

fit
(dataset: deepchem.data.datasets.Dataset, nb_epoch: int = 10, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, deterministic: bool = False, restore: bool = False, variables: Optional[List[torch.nn.parameter.Parameter]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], all_losses: Optional[List[float]] = None) → float[source]¶ Train this model on a dataset.
 Parameters
dataset (Dataset) – the Dataset to train on
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.
 Returns
 Return type
The average loss over the most recent checkpoint interval

fit_generator
(generator: Iterable[Tuple[Any, Any, Any]], max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 1000, restore: bool = False, variables: Optional[List[torch.nn.parameter.Parameter]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], all_losses: Optional[List[float]] = None) → float[source]¶ Train this model on data from a generator.
 Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that you can call fit() repeatedly with the same list and losses will continue to be appended.
 Returns
 Return type
The average loss over the most recent checkpoint interval

fit_on_batch
(X: Sequence, y: Sequence, w: Sequence, variables: Optional[List[torch.nn.parameter.Parameter]] = None, loss: Optional[Callable[[List, List, List], Any]] = None, callbacks: Union[Callable, List[Callable]] = [], checkpoint: bool = True, max_checkpoints_to_keep: int = 5) → float[source]¶ Perform a single step of training.
 Parameters
X (ndarray) – the inputs for the batch
y (ndarray) – the labels for the batch
w (ndarray) – the weights for the batch
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss for each batch. If None (the default), the model’s standard loss function is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after every step. This can be used to perform validation, logging, etc.
checkpoint (bool) – if true, save a checkpoint after performing the training step
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
 Returns
 Return type
the loss on the batch

predict_on_generator
(generator: Iterable[Tuple[Any, Any, Any]], transformers: List[transformers.Transformer] = [], output_types: Optional[Union[str, Sequence[str]]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶  Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.
Returns – a NumPy array of the model produces a single output, or a list of arrays if it produces multiple outputs

predict_on_batch
(X: Union[numpy.ndarray, Sequence], transformers: List[transformers.Transformer] = []) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Generates predictions for input samples, processing samples in a batch.
 Parameters
X (ndarray) – the input data, as a Numpy array.
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
 Returns
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs

predict_uncertainty_on_batch
(X: Sequence, masks: int = 50) → Union[Tuple[numpy.ndarray, numpy.ndarray], Sequence[Tuple[numpy.ndarray, numpy.ndarray]]][source]¶ Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.
 Parameters
X (ndarray) – the input data, as a Numpy array.
masks (int) – the number of dropout masks to average over
 Returns
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
deviation of the corresponding element of y_pred

predict
(dataset: deepchem.data.datasets.Dataset, transformers: List[transformers.Transformer] = [], output_types: Optional[List[str]] = None) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Uses self to make predictions on provided Dataset object.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved from the model. If output_types is specified, outputs must be None.
 Returns
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs

predict_embedding
(dataset: deepchem.data.datasets.Dataset) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Predicts embeddings created by underlying model if any exist. An embedding must be specified to have output_type of ‘embedding’ in the model definition.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
 Returns
a NumPy array of the embeddings model produces, or a list
of arrays if it produces multiple embeddings

predict_uncertainty
(dataset: deepchem.data.datasets.Dataset, masks: int = 50) → Union[Tuple[numpy.ndarray, numpy.ndarray], Sequence[Tuple[numpy.ndarray, numpy.ndarray]]][source]¶ Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977. It involves repeating the prediction many times with different dropout masks. The prediction is computed as the average over all the predictions. The uncertainty includes both the variation among the predicted values (epistemic uncertainty) and the model’s own estimates for how well it fits the data (aleatoric uncertainty). Not all models support uncertainty prediction.
 Parameters
dataset (dc.data.Dataset) – Dataset to make prediction on
masks (int) – the number of dropout masks to average over
 Returns
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
deviation of the corresponding element of y_pred

evaluate_generator
(generator: Iterable[Tuple[Any, Any, Any]], metrics: List[deepchem.metrics.metric.Metric], transformers: List[transformers.Transformer] = [], per_task_metrics: bool = False)[source]¶ Evaluate the performance of this model on the data produced by a generator.
 Parameters
generator (generator) – this should generate batches, each represented as a tuple of the form (inputs, labels, weights).
metric (list of deepchem.metrics.Metric) – Evaluation metric
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output is passed through these transformers to undo the transformations.
per_task_metrics (bool) – If True, return pertask scores.
 Returns
Maps tasks to scores under metric.
 Return type
dict

compute_saliency
(X: numpy.ndarray) → Union[numpy.ndarray, Sequence[numpy.ndarray]][source]¶ Compute the saliency map for an input sample.
This computes the Jacobian matrix with the derivative of each output element with respect to each input element. More precisely,
If this model has a single output, it returns a matrix of shape (output_shape, input_shape) with the derivatives.
If this model has multiple outputs, it returns a list of matrices, one for each output.
This method cannot be used on models that take multiple inputs.
 Parameters
X (ndarray) – the input data for a single sample
 Returns
 Return type
the Jacobian matrix, or a list of matrices

default_generator
(dataset: deepchem.data.datasets.Dataset, epochs: int = 1, mode: str = 'fit', deterministic: bool = True, pad_batches: bool = True) → Iterable[Tuple[List, List, List]][source]¶ Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are generated from the data.
 Parameters
dataset (Dataset) – the data to iterate
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called during prediction), and ‘uncertainty’ (called during uncertainty prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
 Returns
a generator that iterates batches, each represented as a tuple of lists
([inputs], [outputs], [weights])

save_checkpoint
(max_checkpoints_to_keep: int = 5, model_dir: Optional[str] = None) → None[source]¶ Save a checkpoint to disk.
Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.
 Parameters
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir

get_checkpoints
(model_dir: Optional[str] = None)[source]¶ Get a list of all available checkpoint files.
 Parameters
model_dir (str, default None) – Directory to get list of checkpoints from. Reverts to self.model_dir if None

restore
(checkpoint: Optional[str] = None, model_dir: Optional[str] = None) → None[source]¶ Reload the values of all variables from a checkpoint file.
 Parameters
checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If checkpoint is not None, this is ignored.

load_from_pretrained
(source_model: deepchem.models.torch_models.torch_model.TorchModel, assignment_map: Optional[Dict[Any, Any]] = None, value_map: Optional[Dict[Any, Any]] = None, checkpoint: Optional[str] = None, model_dir: Optional[str] = None, include_top: bool = True, inputs: Optional[Sequence[Any]] = None, **kwargs) → None[source]¶ Copies parameter values from a pretrained model. source_model can either be a pretrained model or a model with the same architecture. value_map is a parametervalue dictionary. If no value_map is provided, the parameter values are restored to the source_model from a checkpoint and a default value_map is created. assignment_map is a dictionary mapping parameters from the source_model to the current model. If no assignment_map is provided, one is made from scratch and assumes the model is composed of several different layers, with the final one being a dense layer. include_top is used to control whether or not the final dense layer is used. The default assignment map is useful in cases where the type of task is different (classification vs regression) and/or number of tasks in the setting.
 Parameters
source_model (dc.TorchModel, required) – source_model can either be the pretrained model or a dc.TorchModel with the same architecture as the pretrained model. It is used to restore from a checkpoint, if value_map is None and to create a default assignment map if assignment_map is None
assignment_map (Dict, default None) – Dictionary mapping the source_model parameters and current model parameters
value_map (Dict, default None) – Dictionary containing source_model trainable parameters mapped to numpy arrays. If value_map is None, the values are restored and a default parameter map is created using the restored values
checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints
model_dir (str, default None) – Restore model from custom model directory if needed
include_top (bool, default True) – if True, copies the weights and bias associated with the final dense layer. Used only when assignment map is None
inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.
CGCNNModel¶

class
CGCNNModel
(in_node_dim: int = 92, hidden_node_dim: int = 64, in_edge_dim: int = 41, num_conv: int = 3, predictor_hidden_feats: int = 128, n_tasks: int = 1, mode: str = 'regression', n_classes: int = 2, **kwargs)[source]¶ Crystal Graph Convolutional Neural Network (CGCNN).
Here is a simple example of code that uses the CGCNNModel with materials dataset.
>> import deepchem as dc >> dataset_config = {“reload”: False, “featurizer”: dc.feat.CGCNNFeaturizer, “transformers”: []} >> tasks, datasets, transformers = dc.molnet.load_perovskite(**dataset_config) >> train, valid, test = datasets >> model = dc.models.CGCNNModel(mode=’regression’, batch_size=32, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
This model takes arbitary crystal structures as an input, and predict material properties using the element information and connection of atoms in the crystal. If you want to get some material properties which has a high computational cost like band gap in the case of DFT, this model may be useful. This model is one of variants of Graph Convolutional Networks. The main differences between other GCN models are how to construct graphs and how to update node representations. This model defines the crystal graph from structures using distances between atoms. The crystal graph is an undirected multigraph which is defined by nodes representing atom properties and edges representing connections between atoms in a crystal. And, this model updates the node representations using both neighbor node and edge representations. Please confirm the detail algorithms from [1]_.
References
 1
Xie, Tian, and Jeffrey C. Grossman. “Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.” Physical review letters 120.14 (2018): 145301.
Notes
This class requires DGL and PyTorch to be installed.

__init__
(in_node_dim: int = 92, hidden_node_dim: int = 64, in_edge_dim: int = 41, num_conv: int = 3, predictor_hidden_feats: int = 128, n_tasks: int = 1, mode: str = 'regression', n_classes: int = 2, **kwargs)[source]¶ This class accepts all the keyword arguments from TorchModel.
 Parameters
in_node_dim (int, default 92) – The length of the initial node feature vectors. The 92 is based on length of vectors in the atom_init.json.
hidden_node_dim (int, default 64) – The length of the hidden node feature vectors.
in_edge_dim (int, default 41) – The length of the initial edge feature vectors. The 41 is based on default setting of CGCNNFeaturizer.
num_conv (int, default 3) – The number of convolutional layers.
predictor_hidden_feats (int, default 128) – The size for hidden representations in the output MLP predictor.
n_tasks (int, default 1) – The number of the output size.
mode (str, default 'regression') – The model type, ‘classification’ or ‘regression’.
n_classes (int, default 2) – The number of classes to predict (only used in classification mode).
kwargs (Dict) – This class accepts all the keyword arguments from TorchModel.
GATModel¶

class
GATModel
(n_tasks: int, graph_attention_layers: Optional[list] = None, n_attention_heads: int = 8, agg_modes: Optional[list] = None, activation=<function elu>, residual: bool = True, dropout: float = 0.0, alpha: float = 0.2, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶ Model for Graph Property Prediction Based on Graph Attention Networks (GAT).
This model proceeds as follows:
Update node representations in graphs with a variant of GAT
For each graph, compute its representation by 1) a weighted sum of the node representations in the graph, where the weights are computed by applying a gating function to the node representations 2) a max pooling of the node representations 3) concatenating the output of 1) and 2)
Perform the final prediction using an MLP
Examples
>>> >> import deepchem as dc >> from deepchem.models import GATModel >> featurizer = dc.feat.MolGraphConvFeaturizer() >> tasks, datasets, transformers = dc.molnet.load_tox21( .. reload=False, featurizer=featurizer, transformers=[]) >> train, valid, test = datasets >> model = GATModel(mode='classification', n_tasks=len(tasks), .. batch_size=32, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
References
 1
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. “Graph Attention Networks.” ICLR 2018.
Notes
This class requires DGL (https://github.com/dmlc/dgl) and DGLLifeSci (https://github.com/awslabs/dgllifesci) to be installed.

__init__
(n_tasks: int, graph_attention_layers: Optional[list] = None, n_attention_heads: int = 8, agg_modes: Optional[list] = None, activation=<function elu>, residual: bool = True, dropout: float = 0.0, alpha: float = 0.2, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
graph_attention_layers (list of int) – Width of channels per attention head for GAT layers. graph_attention_layers[i] gives the width of channel for each attention head for the ith GAT layer. If both
graph_attention_layers
andagg_modes
are specified, they should have equal length. If not specified, the default value will be [8, 8].n_attention_heads (int) – Number of attention heads in each GAT layer.
agg_modes (list of str) – The way to aggregate multihead attention results for each GAT layer, which can be either ‘flatten’ for concatenating allhead results or ‘mean’ for averaging allhead results.
agg_modes[i]
gives the way to aggregate multihead attention results for the ith GAT layer. If bothgraph_attention_layers
andagg_modes
are specified, they should have equal length. If not specified, the model will flatten multihead results for intermediate GAT layers and compute mean of multihead results for the last GAT layer.activation (activation function or None) – The activation function to apply to the aggregated multihead results for each GAT layer. If not specified, the default value will be ELU.
residual (bool) – Whether to add a residual connection within each GAT layer. Default to True.
dropout (float) – The dropout probability within each GAT layer. Default to 0.
alpha (float) – A hyperparameter in LeakyReLU, which is the slope for negative values. Default to 0.2.
predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.
predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
n_classes (int) – The number of classes to predict per task (only used when
mode
is ‘classification’). Default to 2.self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
GCNModel¶

class
GCNModel
(n_tasks: int, graph_conv_layers: Optional[list] = None, activation=None, residual: bool = True, batchnorm: bool = False, dropout: float = 0.0, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features=30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶ Model for Graph Property Prediction Based on Graph Convolution Networks (GCN).
This model proceeds as follows:
Update node representations in graphs with a variant of GCN
For each graph, compute its representation by 1) a weighted sum of the node representations in the graph, where the weights are computed by applying a gating function to the node representations 2) a max pooling of the node representations 3) concatenating the output of 1) and 2)
Perform the final prediction using an MLP
Examples
>>> >> import deepchem as dc >> from deepchem.models import GCNModel >> featurizer = dc.feat.MolGraphConvFeaturizer() >> tasks, datasets, transformers = dc.molnet.load_tox21( .. reload=False, featurizer=featurizer, transformers=[]) >> train, valid, test = datasets >> model = GCNModel(mode='classification', n_tasks=len(tasks), .. batch_size=32, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
References
 1
Thomas N. Kipf and Max Welling. “SemiSupervised Classification with Graph Convolutional Networks.” ICLR 2017.
Notes
This class requires DGL (https://github.com/dmlc/dgl) and DGLLifeSci (https://github.com/awslabs/dgllifesci) to be installed.
This model is different from deepchem.models.GraphConvModel as follows:
For each graph convolution, the learnable weight in this model is shared across all nodes.
GraphConvModel
employs separate learnable weights for nodes of different degrees. A learnable weight is shared across all nodes of a particular degree.For
GraphConvModel
, there is an additional GraphPool operation after each graph convolution. The operation updates the representation of a node by applying an elementwise maximum over the representations of its neighbors and itself.For computing graphlevel representations, this model computes a weighted sum and an elementwise maximum of the representations of all nodes in a graph and concatenates them. The node weights are obtained by using a linear/dense layer followd by a sigmoid function. For
GraphConvModel
, the sum over node representations is unweighted.There are various minor differences in using dropout, skip connection and batch normalization.

__init__
(n_tasks: int, graph_conv_layers: Optional[list] = None, activation=None, residual: bool = True, batchnorm: bool = False, dropout: float = 0.0, predictor_hidden_feats: int = 128, predictor_dropout: float = 0.0, mode: str = 'regression', number_atom_features=30, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
graph_conv_layers (list of int) – Width of channels for GCN layers. graph_conv_layers[i] gives the width of channel for the ith GCN layer. If not specified, the default value will be [64, 64].
activation (callable) – The activation function to apply to the output of each GCN layer. By default, no activation function will be applied.
residual (bool) – Whether to add a residual connection within each GCN layer. Default to True.
batchnorm (bool) – Whether to apply batch normalization to the output of each GCN layer. Default to False.
dropout (float) – The dropout probability for the output of each GCN layer. Default to 0.
predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.
predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
n_classes (int) – The number of classes to predict per task (only used when
mode
is ‘classification’). Default to 2.self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
AttentiveFPModel¶

class
AttentiveFPModel
(n_tasks: int, num_layers: int = 2, num_timesteps: int = 2, graph_feat_size: int = 200, dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶ Model for Graph Property Prediction.
This model proceeds as follows:
Combine node features and edge features for initializing node representations, which involves a round of message passing
Update node representations with multiple rounds of message passing
For each graph, compute its representation by combining the representations of all nodes in it, which involves a gated recurrent unit (GRU).
Perform the final prediction using a linear layer
Examples
>>> >> import deepchem as dc >> from deepchem.models import AttentiveFPModel >> featurizer = dc.feat.MolGraphConvFeaturizer(use_edges=True) >> tasks, datasets, transformers = dc.molnet.load_tox21( .. reload=False, featurizer=featurizer, transformers=[]) >> train, valid, test = datasets >> model = AttentiveFPModel(mode='classification', n_tasks=len(tasks), .. batch_size=32, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
References
 1
Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xutong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, and Mingyue Zheng. “Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism.” Journal of Medicinal Chemistry. 2020, 63, 16, 8749–8760.
Notes
This class requires DGL (https://github.com/dmlc/dgl) and DGLLifeSci (https://github.com/awslabs/dgllifesci) to be installed.

__init__
(n_tasks: int, num_layers: int = 2, num_timesteps: int = 2, graph_feat_size: int = 200, dropout: float = 0.0, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = True, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
num_layers (int) – Number of graph neural network layers, i.e. number of rounds of message passing. Default to 2.
num_timesteps (int) – Number of time steps for updating graph representations with a GRU. Default to 2.
graph_feat_size (int) – Size for graph representations. Default to 200.
dropout (float) – Dropout probability. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.
n_classes (int) – The number of classes to predict per task (only used when
mode
is ‘classification’). Default to 2.self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. When input graphs have isolated nodes, self loops allow preserving the original feature of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
PagtnModel¶

class
PagtnModel
(n_tasks: int, number_atom_features: int = 94, number_bond_features: int = 42, mode: str = 'regression', n_classes: int = 2, output_node_features: int = 256, hidden_features: int = 32, num_layers: int = 5, num_heads: int = 1, dropout: float = 0.1, pool_mode: str = 'sum', **kwargs)[source]¶ Model for Graph Property Prediction.
This model proceeds as follows:
Update node representations in graphs with a variant of GAT, where a linear additive form of attention is applied. Attention Weights are derived by concatenating the node and edge features for each bond.
Update node representations with multiple rounds of message passing.
For each layer has, residual connections with its previous layer.
The final molecular representation is computed by combining the representations of all nodes in the molecule.
Perform the final prediction using a linear layer
Examples
>>> >> import deepchem as dc >> from deepchem.models import PagtnModel >> featurizer = dc.feat.PagtnMolGraphFeaturizer(max_length=5) >> tasks, datasets, transformers = dc.molnet.load_tox21( .. reload=False, featurizer=featurizer, transformers=[]) >> train, valid, test = datasets >> model = PagtnModel(mode='classification', n_tasks=len(tasks), .. batch_size=16, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
References
 1
Benson Chen, Regina Barzilay, Tommi Jaakkola. “PathAugmented Graph Transformer Network.” arXiv:1905.12712
Notes
This class requires DGL (https://github.com/dmlc/dgl) and DGLLifeSci (https://github.com/awslabs/dgllifesci) to be installed.

__init__
(n_tasks: int, number_atom_features: int = 94, number_bond_features: int = 42, mode: str = 'regression', n_classes: int = 2, output_node_features: int = 256, hidden_features: int = 32, num_layers: int = 5, num_heads: int = 1, dropout: float = 0.1, pool_mode: str = 'sum', **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
number_atom_features (int) – Size for the input node features. Default to 94.
number_bond_features (int) – Size for the input edge features. Default to 42.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
n_classes (int) – The number of classes to predict per task (only used when
mode
is ‘classification’). Default to 2.output_node_features (int) – Size for the output node features in PAGTN layers. Default to 256.
hidden_features (int) – Size for the hidden node features in PAGTN layers. Default to 32.
num_layers (int) – Number of graph neural network layers, i.e. number of rounds of message passing. Default to 2.
num_heads (int) – Number of attention heads. Default to 1.
dropout (float) – Dropout probability. Default to 0.1
pool_mode ('max' or 'mean' or 'sum') – Whether to compute elementwise maximum, mean or sum of the node representations.
kwargs – This can include any keyword argument of TorchModel.
MPNNModel¶
Note that this is an alternative implementation for MPNN and currently you can only import it from
deepchem.models.torch_models
.

class
MPNNModel
(n_tasks: int, node_out_feats: int = 64, edge_hidden_feats: int = 128, num_step_message_passing: int = 3, num_step_set2set: int = 6, num_layer_set2set: int = 3, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = False, **kwargs)[source]¶ Model for graph property prediction
This model proceeds as follows:
Combine latest node representations and edge features in updating node representations, which involves multiple rounds of message passing
For each graph, compute its representation by combining the representations of all nodes in it, which involves a Set2Set layer.
Perform the final prediction using an MLP
Examples
>>> >> import deepchem as dc >> from deepchem.models.torch_models import MPNNModel >> featurizer = dc.feat.MolGraphConvFeaturizer(use_edges=True) >> tasks, datasets, transformers = dc.molnet.load_tox21( .. reload=False, featurizer=featurizer, transformers=[]) >> train, valid, test = datasets >> model = MPNNModel(mode='classification', n_tasks=len(tasks), .. batch_size=32, learning_rate=0.001) >> model.fit(train, nb_epoch=50)
References
 1
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl. “Neural Message Passing for Quantum Chemistry.” ICML 2017.
Notes
This class requires DGL (https://github.com/dmlc/dgl) and DGLLifeSci (https://github.com/awslabs/dgllifesci) to be installed.

__init__
(n_tasks: int, node_out_feats: int = 64, edge_hidden_feats: int = 128, num_step_message_passing: int = 3, num_step_set2set: int = 6, num_layer_set2set: int = 3, mode: str = 'regression', number_atom_features: int = 30, number_bond_features: int = 11, n_classes: int = 2, self_loop: bool = False, **kwargs)[source]¶  Parameters
n_tasks (int) – Number of tasks.
node_out_feats (int) – The length of the final node representation vectors. Default to 64.
edge_hidden_feats (int) – The length of the hidden edge representation vectors. Default to 128.
num_step_message_passing (int) – The number of rounds of message passing. Default to 3.
num_step_set2set (int) – The number of set2set steps. Default to 6.
num_layer_set2set (int) – The number of set2set layers. Default to 3.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.
n_classes (int) – The number of classes to predict per task (only used when
mode
is ‘classification’). Default to 2.self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. Generally, an MPNNModel does not require self loops. Default to False.
kwargs – This can include any keyword argument of TorchModel.
LCNNModel¶

class
LCNNModel
(n_occupancy: int = 3, n_neighbor_sites_list: int = 19, n_permutation_list: int = 6, n_task: int = 1, dropout_rate: float = 0.4, n_conv: int = 2, n_features: int = 44, sitewise_n_feature: int = 25, **kwargs)[source]¶ Lattice Convolutional Neural Network (LCNN). Here is a simple example of code that uses the LCNNModel with Platinum 2d Adsorption dataset.
This model takes arbitrary configurations of Molecules on an adsorbate and predicts their formation energy. These formation energies are found using DFT calculations and LCNNModel is to automate that process. This model defines a crystal graph using the distance between atoms. The crystal graph is an undirected regular graph (equal neighbours) and different permutations of the neighbours are precomputed using the LCNNFeaturizer. On each node for each permutation, the neighbour nodes are concatenated which are further operated. This model has only a node representation. Please confirm the detail algorithms from [1]_.
Examples
>>> >> import deepchem as dc >> from pymatgen.core import Structure >> import numpy as np >> from deepchem.feat import LCNNFeaturizer >> from deepchem.molnet import load_Platinum_Adsorption >> PRIMITIVE_CELL = { .. "lattice": [[2.818528, 0.0, 0.0], .. [1.409264, 2.440917, 0.0], .. [0.0, 0.0, 25.508255]], .. "coords": [[0.66667, 0.33333, 0.090221], .. [0.33333, 0.66667, 0.18043936], .. [0.0, 0.0, 0.27065772], .. [0.66667, 0.33333, 0.36087608], .. [0.33333, 0.66667, 0.45109444], .. [0.0, 0.0, 0.49656991]], .. "species": ['H', 'H', 'H', 'H', 'H', 'He'], .. "site_properties": {'SiteTypes': ['S1', 'S1', 'S1', 'S1', 'S1', 'A1']} .. } >> PRIMITIVE_CELL_INF0 = { .. "cutoff": np.around(6.00), .. "structure": Structure(**PRIMITIVE_CELL), .. "aos": ['1', '0', '2'], .. "pbc": [True, True, False], .. "ns": 1, .. "na": 1 .. } >> tasks, datasets, transformers = load_Platinum_Adsorption( .. featurizer= LCNNFeaturizer( **PRIMITIVE_CELL_INF0) .. ) >> train, val, test = datasets >> model = LCNNModel(mode='regression', .. batch_size=8, .. learning_rate=0.001) >> model = LCNN() >> out = model(lcnn_feat) >> model.fit(train, nb_epoch=10)
References
 1
Jonathan Lym and Geun Ho Gu, J. Phys. Chem. C 2019, 123, 18951−18959.
Notes
This class requires DGL and PyTorch to be installed.

__init__
(n_occupancy: int = 3, n_neighbor_sites_list: int = 19, n_permutation_list: int = 6, n_task: int = 1, dropout_rate: float = 0.4, n_conv: int = 2, n_features: int = 44, sitewise_n_feature: int = 25, **kwargs)[source]¶ This class accepts all the keyword arguments from TorchModel.
 Parameters
n_occupancy (int, default 3) – number of possible occupancy.
n_neighbor_sites_list (int, default 19) – Number of neighbors of each site.
n_permutation (int, default 6) – Diffrent permutations taken along diffrent directions.
n_task (int, default 1) – Number of tasks.
dropout_rate (float, default 0.4) – p value for dropout between 0.0 to 1.0
nconv (int, default 2) – number of convolutions performed.
n_feature (int, default 44) – number of feature for each site.
sitewise_n_feature (int, default 25) – number of features for atoms for sitewise activation.
kwargs (Dict) – This class accepts all the keyword arguments from TorchModel.