DeepChem maintains an extensive collection of models for scientific
applications. DeepChem’s focus is on facilitating scientific applications, so
we support a broad range of different machine learning frameworks (currently
scikit-learn, xgboost, TensorFlow, and PyTorch) since different frameworks are
more and less suited for different scientific applications.
If you’re just getting started with DeepChem, you’re probably interested in the
basics. The place to get started is this “model cheatsheet” that lists various
types of custom DeepChem models. Note that some wrappers like SklearnModel
and GBDTModel which wrap external machine learning libraries are excluded,
but this table should otherwise be complete.
As a note about how to read these tables: Each row describes what’s needed to
invoke a given model. Some models must be applied with given Transformer or
Featurizer objects. Most models can be trained calling model.fit,
otherwise the name of the fit_method is given in the Comment column.
In order to run the models, make sure that the backend (Keras and tensorflow
or Pytorch or Jax) is installed.
You can thus read off what’s needed to train the model from the table below.
Many models implemented in DeepChem were designed for small to medium-sized organic molecules,
most often drug-like compounds.
If your data is very different (e.g. molecules contain ‘exotic’ elements not present in the original dataset)
or cannot be represented well using SMILES (e.g. metal complexes, crystals), some adaptations to the
featurization and/or model might be needed to get reasonable results.
This is intended only for convenience of subclass implementations
and should not be invoked directly.
Parameters:
model (object) – Wrapper around ScikitLearn/Keras/Tensorflow model object.
model_dir (str, optional (default None)) – Path to directory where model will be stored. If not specified,
model will be stored in a temporary directory.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
Evaluates the performance of this model on specified dataset.
This function uses Evaluator under the hood to perform model
evaluation. As a result, it inherits the same limitations of
Evaluator. Namely, that only regression and classification
models can be evaluated in this fashion. For generator models, you
will need to overwrite this method to perform a custom evaluation.
Keyword arguments specified here will be passed to
Evaluator.compute_model_performance.
metrics (Metric / List[Metric] / function) – The set of metrics provided. This class attempts to do some
intelligent handling of input. If a single dc.metrics.Metric
object is provided or a list is provided, it will evaluate
self.model on these metrics. If a function is provided, it is
assumed to be a metric function that this method will attempt to
wrap in a dc.metrics.Metric object. A metric function must
accept two arguments, y_true, y_pred both of which are
np.ndarray objects and return a floating point score. The
metric function may also accept a keyword argument
sample_weight to account for per-sample weights.
transformers (List[Transformer]) – List of dc.trans.Transformer objects. These transformations
must have been applied to dataset previously. The dataset will
be untransformed for metric evaluation.
per_task_metrics (bool, optional (default False)) – If true, return computed metric for each task on multitask dataset.
use_sample_weights (bool, optional (default False)) – If set, use per-sample weights w.
n_classes (int, optional (default None)) – If specified, will use n_classes as the number of unique classes
in self.dataset. Note that this argument will be ignored for
regression metrics.
Returns:
multitask_scores (dict) – Dictionary mapping names of metrics to metric scores.
all_task_scores (dict, optional) – If per_task_metrics == True is passed as a keyword argument,
then returns a second dictionary of scores for each task
separately.
Scikit-learn’s models can be wrapped so that they can interact conveniently
with DeepChem. Oftentimes scikit-learn models are more robust and easier to
train and are a nice first model to train.
Wrapper class that wraps scikit-learn models as DeepChem models.
When you’re working with scikit-learn and DeepChem, at times it can
be useful to wrap a scikit-learn model as a DeepChem model. The
reason for this might be that you want to do an apples-to-apples
comparison of a scikit-learn model to another DeepChem model, or
perhaps you want to use the hyperparameter tuning capabilities in
dc.hyper. The SklearnModel class provides a wrapper around scikit-learn
models that allows scikit-learn models to be trained on Dataset objects
and evaluated with the same metrics as other DeepChem models.
Example
>>> importdeepchemasdc>>> importnumpyasnp>>> fromsklearn.linear_modelimportLinearRegression>>> # Generating a random data and creating a dataset>>> X,y=np.random.randn(5,1),np.random.randn(5)>>> dataset=dc.data.NumpyDataset(X,y)>>> # Wrapping a Sklearn Linear Regression model using DeepChem models API>>> sklearn_model=LinearRegression()>>> dc_model=dc.models.SklearnModel(sklearn_model)>>> dc_model.fit(dataset)# fitting dataset
Notes
All SklearnModels perform learning solely in memory. This means that it
may not be possible to train SklearnModel on large `Dataset`s.
The value is a return value of predict_proba or predict method
of the scikit-learn model. If the scikit-learn model has both methods,
the value is always a return value of predict_proba.
dataset (Dataset) – Dataset to make prediction on.
transformers (List[Transformer]) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
model (BaseEstimator) – The model instance of scikit-learn wrapper LightGBM/XGBoost models.
model_dir (str, optional (default None)) – Path to directory where model will be stored.
early_stopping_rounds (int, optional (default 50)) – Activates early stopping. Validation metric needs to improve at least once
in every early_stopping_rounds round(s) to continue training.
eval_metric (Union[str, Callable]) – If string, it should be a built-in evaluation metric to use.
If callable, it should be a custom evaluation metric, see official note for more details.
First, this function splits all data into train and valid data (8:2),
and finds the best n_estimators. And then, we retrain all data using
best n_estimators * 1.25.
Parameters:
dataset (Dataset) – The Dataset to train this model on.
DeepChem maintains a lightweight layer of common deep learning model
infrastructure that can be used for models built with different underlying
frameworks. The losses and optimizers can be used for both TensorFlow and
PyTorch models.
Modified version of L1 Loss, also known as Smooth L1 loss.
Less sensitive to small errors, linear for larger errors.
Huber loss is generally better for cases where are are both large outliers as well as small, as compared to the L1 loss.
By default, Delta = 1.0 and reduction = ‘none’.
The Poisson loss function is defined as the mean of the elements of y_pred - (y_true * log(y_pred) for an input of (y_true, y_pred).
Poisson loss is generally used for regression tasks where the data follows the poisson
The arguments should each have shape (batch_size) or (batch_size, tasks). The
labels should be probabilities, while the outputs should be logits that are
converted to probabilities using a sigmoid function.
The cross entropy between two probability distributions.
The arguments should each have shape (batch_size, classes) or
(batch_size, tasks, classes). The labels should be probabilities, while the
outputs should be logits that are converted to probabilities using a softmax
function.
The cross entropy between two probability distributions.
The labels should have shape (batch_size) or (batch_size, tasks), and be
integer class labels. The outputs have shape (batch_size, classes) or
(batch_size, tasks, classes) and be logits that are converted to probabilities
using a softmax function.
The Variational AutoEncoder loss, KL Divergence Regularize + marginal log-likelihood.
This losses based on _[1].
ELBO(Evidence lower bound) lexically replaced Variational lower bound.
BCE means marginal log-likelihood, and KLD means KL divergence with normal distribution.
Added hyper parameter ‘kl_scale’ for KLD.
The logvar and mu should have shape (batch_size, hidden_space).
The x and reconstruction_x should have (batch_size, attribute).
The kl_scale should be float.
Examples
Examples for calculating loss using constant tensor.
batch_size = 2,
hidden_space = 2,
num of original attribute = 3
>>> import numpy as np
>>> import torch
>>> import tensorflow as tf
>>> logvar = np.array([[1.0,1.3],[0.6,1.2]])
>>> mu = np.array([[0.2,0.7],[1.2,0.4]])
>>> x = np.array([[0.9,0.4,0.8],[0.3,0,1]])
>>> reconstruction_x = np.array([[0.8,0.3,0.7],[0.2,0,0.9]])
The KL_divergence between hidden distribution and normal distribution.
This loss represents KL divergence losses between normal distribution(using parameter of distribution)
based on _[1].
The logvar should have shape (batch_size, hidden_space) and each term represents
standard deviation of hidden distribution. The mean shuold have
(batch_size, hidden_space) and each term represents mean of hidden distribtuon.
Examples
Examples for calculating loss using constant tensor.
batch_size = 2,
hidden_space = 2,
>>> import numpy as np
>>> import torch
>>> import tensorflow as tf
>>> logvar = np.array([[1.0,1.3],[0.6,1.2]])
>>> mu = np.array([[0.2,0.7],[1.2,0.4]])
Case tensorflow
>>> VAE_KLDivergence()._compute_tf_loss(tf.constant(logvar), tf.constant(mu))
<tf.Tensor: shape=(2,), dtype=float64, numpy=array([0.17381787, 0.51425203])>
Case pytorch
>>> (VAE_KLDivergence()._create_pytorch_loss())(torch.tensor(logvar), torch.tensor(mu))
tensor([0.1738, 0.5143], dtype=torch.float64)
Global-global encoding loss (comparing two full graphs).
Compares the encodings of two molecular graphs and returns the loss between them based on the measure specified.
The encodings are generated by two separate encoders in order to maximize the mutual information between the two encodings.
Parameters:
global_enc (torch.Tensor) – Features from a graph convolutional encoder.
global_enc2 (torch.Tensor) – Another set of features from a graph convolutional encoder.
measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.
average_loss (bool) – Whether to average the loss over the batch
Returns:
loss – Measure of mutual information between the encodings of the two graphs.
Local-global encoding loss (comparing a subgraph to the full graph).
Compares the encodings of two molecular graphs and returns the loss between them based on the measure specified.
The encodings are generated by two separate encoders in order to maximize the mutual information between the two encodings.
Parameters:
local_enc (torch.Tensor) – Features from a graph convolutional encoder.
global_enc (torch.Tensor) – Another set of features from a graph convolutional encoder.
batch_graph_index (graph_index: np.ndarray or torch.tensor, dtype int) – This vector indicates which graph the node belongs with shape [num_nodes,]. Only present in BatchGraphData, not in GraphData objects.
measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’, ‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.
average_loss (bool) – Whether to average the loss over the batch
Returns:
loss – Measure of mutual information between the encodings of the two graphs.
The Grover Pretraining consists learning of atom embeddings and bond embeddings for
a molecule. To this end, the learning consists of three tasks:
Learning of atom vocabulary from atom embeddings and bond embeddings
Learning of bond vocabulary from atom embeddings and bond embeddings
Learning to predict functional groups from atom embedings readout and bond embeddings readout
The loss function accepts atom vocabulary labels, bond vocabulary labels and functional group
predictions produced by Grover model during pretraining as a dictionary and applies negative
log-likelihood loss for atom vocabulary and bond vocabulary predictions and Binary Cross Entropy
loss for functional group prediction and sums these to get overall loss.
EdgePredictionLoss is an unsupervised graph edge prediction loss function that calculates the loss based on the similarity between node embeddings for positive and negative edge pairs. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.
This loss function encourages the model to learn node embeddings that can effectively distinguish between true edges (positive samples) and false edges (negative samples) in the graph.
The loss is computed by comparing the similarity scores (dot product) of node embeddings for positive and negative edge pairs. The goal is to maximize the similarity for positive pairs and minimize it for negative pairs.
To use this loss function, the input must be a BatchGraphData object transformed by the negative_edge_sampler. The loss function takes the node embeddings and the input graph data (with positive and negative edge pairs) as inputs and returns the edge prediction loss.
GraphNodeMaskingLoss is an unsupervised graph node masking loss function that calculates the loss based on the predicted node labels and true node labels. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.
This loss function encourages the model to learn node embeddings that can effectively predict the masked node labels in the graph.
The loss is computed using the CrossEntropyLoss between the predicted node labels and the true node labels.
To use this loss function, the input must be a BatchGraphData object transformed by the mask_nodes function. The loss function takes the predicted node labels, predicted edge labels, and the input graph data (with masked node labels) as inputs and returns the node masking loss.
GraphEdgeMaskingLoss is an unsupervised graph edge masking loss function that calculates the loss based on the predicted edge labels and true edge labels. This loss function is designed for graph neural networks and is particularly useful for pre-training tasks.
This loss function encourages the model to learn node embeddings that can effectively predict the masked edge labels in the graph.
The loss is computed using the CrossEntropyLoss between the predicted edge labels and the true edge labels.
To use this loss function, the input must be a BatchGraphData object transformed by the mask_edges function. The loss function takes the predicted edge labels and the true edge labels as inputs and returns the edge masking loss.
Parameters:
pred_edge (torch.Tensor) – Predicted edge labels.
inputs (BatchGraphData) – Input graph data (with masked edge labels).
Loss that maximizes mutual information between local node representations and a pooled global graph representation. This is to encourage nearby nodes to have similar embeddings.
Parameters:
positive_score (torch.Tensor) – Positive score. This score measures the similarity between the local node embeddings (node_emb) and the global graph representation (positive_expanded_summary_emb) derived from the same graph.
The goal is to maximize this score, as it indicates that the local node embeddings and the global graph representation are highly correlated, capturing the mutual information between them.
negative_score (torch.Tensor) – Negative score. This score measures the similarity between the local node embeddings (node_emb) and the global graph representation (negative_expanded_summary_emb) derived from a different graph (shifted by one position in this case).
The goal is to minimize this score, as it indicates that the local node embeddings and the global graph representation from different graphs are not correlated, ensuring that the model learns meaningful representations that are specific to each graph.
GraphContextPredLoss is a loss function designed for graph neural networks that aims to predict the context of a node given its substructure. The context of a node is essentially the ring of nodes around it outside of an inner k1-hop diameter and inside an outer k2-hop diameter.
This loss compares the representation of a node’s neighborhood with the representation of the node’s context. It then uses negative sampling to compare the representation of the node’s neighborhood with the representation of a random node’s context.
Parameters:
mode (str) – The mode of the model. It can be either “cbow” (continuous bag of words) or “skipgram”.
neg_samples (int) – The number of negative samples to use for negative sampling.
Loss for the density profile entry type for Quantum Chemistry calculations.
It is an integration of the squared difference between ground truth and calculated
values, at all spaces in the integration grid.
Examples
>>> fromdeepchem.models.lossesimportDensityProfileLoss>>> importtorch>>> volume=torch.Tensor([2.0])>>> output=torch.Tensor([3.0])>>> labels=torch.Tensor([4.0])>>> loss=(DensityProfileLoss()._create_pytorch_loss(volume))(output,labels)>>> # Generating volume tensor for an entry object:>>> fromdeepchem.feat.dft_dataimportDFTEntry>>> e_type='dens'>>> true_val=0>>> systems=[{'moldesc':'H 0.86625 0 0; F -0.86625 0 0','basis':'6-311++G(3df,3pd)'}]>>> dens_entry_for_HF=DFTEntry.create(e_type,true_val,systems)>>> grid=(dens_entry_for_HF).get_integration_grid()
The 6-311++G(3df,3pd) basis for atomz 1 does not exist, but we will download it
Downloaded to /usr/share/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/01.gaussian94
The 6-311++G(3df,3pd) basis for atomz 9 does not exist, but we will download it
Downloaded to /usr/share/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/09.gaussian94
This is a modification of the NTXent loss function from Chen [1]_. This loss is designed for contrastive learning of molecular representations, comparing the similarity of a molecule’s latent representation to positive and negative samples.
The modifications proposed in [2]_ enable multiple conformers to be used as positive samples.
This loss function is designed for graph neural networks and is particularly useful for unsupervised pre-training tasks.
Parameters:
norm (bool, optional (default=True)) – Whether to normalize the similarity matrix.
tau (float, optional (default=0.5)) – Temperature parameter for the similarity matrix.
uniformity_reg (float, optional (default=0)) – Regularization weight for the uniformity loss.
variance_reg (float, optional (default=0)) – Regularization weight for the variance loss.
covariance_reg (float, optional (default=0)) – Regularization weight for the covariance loss.
conformer_variance_reg (float, optional (default=0)) – Regularization weight for the conformer variance loss.
Adagrad is an optimizer with parameter-specific learning rates, which are
adapted relative to how frequently a parameter gets updated during training.
The more updates a parameter receives, the smaller the updates. See [1]_ for
a full reference for the algorithm.
Construct an AdaGrad optimizer.
:param learning_rate: the learning rate to use for optimization
:type learning_rate: float or LearningRateSchedule
:param initial_accumulator_value: a parameter of the AdaGrad algorithm
:type initial_accumulator_value: float
:param epsilon: a parameter of the AdaGrad algorithm
:type epsilon: float
Construct an AdamW optimizer.
:param learning_rate: the learning rate to use for optimization
:type learning_rate: float or LearningRateSchedule
:param weight_decay: weight decay coefficient for AdamW
:type weight_decay: float or LearningRateSchedule
:param beta1: a parameter of the Adam algorithm
:type beta1: float
:param beta2: a parameter of the Adam algorithm
:type beta2: float
:param epsilon: a parameter of the Adam algorithm
:type epsilon: float
:param amsgrad: If True, will use the AMSGrad variant of AdamW (from “On the Convergence of Adam and Beyond”), else will use the original algorithm.
:type amsgrad: bool
The Sparse Adam optimization algorithm, also known as Lazy Adam.
Sparse Adam is suitable for sparse tensors. It handles sparse updates more efficiently.
It only updates moving-average accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices.
The learning rate starts as initial_rate. It smoothly decreases to final_rate over decay_steps training steps.
It decays as a function of (1-step/decay_steps)**power. Once the final rate is reached, it remains there for
the rest of optimization.
Parameters:
initial_rate (float) – the initial learning rate
final_rate (float) – the final learning rate
decay_steps (int) – the number of training steps over which the rate decreases from initial_rate to final_rate
power (float) – the exponent controlling the shape of the decay
Training loss and validation metrics can be automatically logged to Weights & Biases with the following commands:
# Install wandb in shell
pip install wandb
# Login in shell (required only once)
wandb login
# Login in notebook (required only once)
import wandb
wandb.login()
# Initialize a WandbLogger
logger = WandbLogger(…)
# Set `wandb_logger` when creating `KerasModel`
import deepchem as dc
# Log training loss to wandb
model = dc.models.KerasModel(…, wandb_logger=logger)
model.fit(…)
# Log validation metrics to wandb using ValidationCallback
import deepchem as dc
vc = dc.models.ValidationCallback(…)
model = KerasModel(…, wandb_logger=logger)
model.fit(…, callbacks=[vc])
logger.finish()
The loss function for a model can be defined in two different
ways. For models that have only a single output and use a
standard loss function, you can simply provide a
dc.models.losses.Loss object. This defines the loss for each
sample or sample/task pair. The result is automatically
multiplied by the weights and averaged over the batch. Any
additional losses computed by model layers, such as weight
decay penalties, are also added.
For more complicated cases, you can instead provide a function
that directly computes the total loss. It must be of the form
f(outputs, labels, weights), taking the list of outputs from
the model, the expected values, and any weight matrices. It
should return a scalar equal to the value of the loss function
for the batch. No additional processing is done to the
result; it is up to you to do any weighting, averaging, adding
of penalty terms, etc.
You can optionally provide an output_types argument, which
describes how to interpret the model’s outputs. This should
be a list of strings, one for each output. You can use an
arbitrary output_type for a output, but some output_types are
special and will undergo extra processing:
‘prediction’: This is a normal output, and will be returned by predict().
If output types are not specified, all outputs are assumed
to be of this type.
‘loss’: This output will be used in place of the normal
outputs for computing the loss function. For example,
models that output probability distributions usually do it
by computing unbounded numbers (the logits), then passing
them through a softmax function to turn them into
probabilities. When computing the cross entropy, it is more
numerically stable to use the logits directly rather than
the probabilities. You can do this by having the model
produce both probabilities and logits as outputs, then
specifying output_types=[‘prediction’, ‘loss’]. When
predict() is called, only the first output (the
probabilities) will be returned. But during training, it is
the second output (the logits) that will be passed to the
loss function.
‘variance’: This output is used for estimating the
uncertainty in another output. To create a model that can
estimate uncertainty, there must be the same number of
‘prediction’ and ‘variance’ outputs. Each variance output
must have the same shape as the corresponding prediction
output, and each element is an estimate of the variance in
the corresponding prediction. Also be aware that if a model
supports uncertainty, it MUST use dropout on every layer,
and dropout most be enabled during uncertainty prediction.
Otherwise, the uncertainties it computes will be inaccurate.
other: Arbitrary output_types can be used to extract outputs
produced by the model, but will have no additional
processing performed.
model (tf.keras.Model) – the Keras model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each
batch, as described above
output_types (list of strings) – the type of each output from the model, as described above
batch_size (int) – default batch size for training and evaluating
model_dir (str) – the directory on disk where the model will be stored. If this is None,
a temporary directory is created.
learning_rate (float or LearningRateSchedule) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (Optimizer) – the optimizer to use for fitting. If this is specified, learning_rate is
ignored.
tensorboard (bool) – whether to log progress to TensorBoard during training
wandb (bool) – whether to log progress to Weights & Biases during training (deprecated)
log_frequency (int) – The frequency at which to log data. Data is logged using
logging by default. If tensorboard is set, data is also
logged to TensorBoard. If wandb is set, data is also logged
to Weights & Biases. Logging happens at global steps. Roughly,
a global step corresponds to one batch of training. If you’d
like a printout every 10 batch steps, you’d set
log_frequency=10 for example.
wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Returns:
The average loss over the most recent checkpoint interval
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Returns:
The average loss over the most recent checkpoint interval
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
checkpoint (bool) – if true, save a checkpoint after performing the training step
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s
standard prediction outputs will be returned.
Alternatively one or more Tensors within the model may be
specified, in which case the output of those Tensors will
be returned. If outputs is specified, output_types must be
None.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs
Generates predictions for input samples, processing samples in a batch.
Parameters:
X (ndarray) – the input data, as a Numpy array.
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction
outputs will be returned. Alternatively one or more Tensors within the
model may be specified, in which case the output of those Tensors will be
returned.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs
Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977.
It involves repeating the prediction many times with different dropout masks.
The prediction is computed as the average over all the predictions. The
uncertainty includes both the variation among the predicted values (epistemic
uncertainty) and the model’s own estimates for how well it fits the data
(aleatoric uncertainty). Not all models support uncertainty prediction.
Parameters:
X (ndarray) – the input data, as a Numpy array.
masks (int) – the number of dropout masks to average over
Returns:
OneOrMany[Tuple[y_pred, y_std]]
y_pred (np.ndarray) – predicted value of the output
y_std (np.ndarray) – standard deviation of the corresponding element of y_pred
Uses self to make predictions on provided Dataset object.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
outputs (Tensor or list of Tensors) – The outputs to return. If this is None, the model’s standard prediction
outputs will be returned. Alternatively one or more Tensors within the
model may be specified, in which case the output of those Tensors will be
returned.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Predicts embeddings created by underlying model if any exist.
An embedding must be specified to have output_type of
‘embedding’ in the model definition.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
Returns:
a NumPy array of the embeddings model produces, or a list
Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977.
It involves repeating the prediction many times with different dropout masks.
The prediction is computed as the average over all the predictions. The
uncertainty includes both the variation among the predicted values (epistemic
uncertainty) and the model’s own estimates for how well it fits the data
(aleatoric uncertainty). Not all models support uncertainty prediction.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
masks (int) – the number of dropout masks to average over
Returns:
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
per_task_metrics (bool) – If True, return per-task scores.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Usually you do not need to call this method, since fit() saves checkpoints
automatically. If you have disabled automatic checkpointing during fitting,
this can be called to manually write checkpoints.
Parameters:
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir
Reload the values of all variables from a checkpoint file.
Parameters:
checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints.
model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir.
Copies variable values from a pretrained model. source_model can either
be a pretrained model or a model with the same architecture. value_map
is a variable-value dictionary. If no value_map is provided, the variable
values are restored to the source_model from a checkpoint and a default
value_map is created. assignment_map is a dictionary mapping variables
from the source_model to the current model. If no assignment_map is
provided, one is made from scratch and assumes the model is composed of
several different layers, with the final one being a dense layer. include_top
is used to control whether or not the final dense layer is used. The default
assignment map is useful in cases where the type of task is different
(classification vs regression) and/or number of tasks in the setting.
Parameters:
source_model (dc.KerasModel, required) – source_model can either be the pretrained model or a dc.KerasModel with
the same architecture as the pretrained model. It is used to restore from
a checkpoint, if value_map is None and to create a default assignment map
if assignment_map is None
assignment_map (Dict, default None) – Dictionary mapping the source_model variables and current model variables
value_map (Dict, default None) – Dictionary containing source_model trainable variables mapped to numpy
arrays. If value_map is None, the values are restored and a default
variable map is created using the restored values
checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints
model_dir (str, default None) – Restore model from custom model directory if needed
include_top (bool, default True) – if True, copies the weights and bias associated with the final dense
layer. Used only when assignment map is None
inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.
This option is useful only for models that are built by
subclassing tf.keras.Model, and not using the functional API by tf.keras
Implements a neural network for robust multitasking.
The key idea of this model is to have bypass layers that feed
directly from features to task output. This might provide some
flexibility toroute around challenges in multitasking with
destructive interference.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length
of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list,
in which case the same value is used for every layer.
bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer.
n_classes (int) – the number of classes
bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.
bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers.
same requirements as weight_init_stddevs
bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers
same requirements as bias_init_consts
bypass_dropouts (list or float) – the dropout probablity to use for bypass layers.
same requirements as dropouts
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Implements a neural network for robust multitasking.
The key idea of this model is to have bypass layers that feed
directly from features to task output. This might provide some
flexibility to route around challenges in multitasking with
destructive interference.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length
of this list should equal len(layer_sizes). Alternatively this may be a single value instead of a list,
in which case the same value is used for every layer.
bias_init_consts (list or loat) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer.
bypass_layer_sizes (list) – the size of each dense layer in the bypass network. The length of this list determines the number of bypass layers.
bypass_weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of bypass layers.
same requirements as weight_init_stddevs
bypass_bias_init_consts (list or float) – the value to initialize the biases in bypass layers
same requirements as bias_init_consts
bypass_dropouts (list or float) – the dropout probablity to use for bypass layers.
same requirements as dropouts
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Progressive networks allow for multitask learning where each task
gets a new column of weights. As a result, there is no exponential
forgetting where previous tasks are ignored.
Only listing parameters specific to progressive networks here.
Parameters:
n_tasks (int) – Number of tasks
n_features (int) – Number of input features
alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length
of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer.
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1.
The final element corresponds to the output layer. Alternatively this may be a single value instead of a list,
in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer.
Implements a progressive multitask neural network for regression.
Progressive networks allow for multitask learning where each task
gets a new column of weights. As a result, there is no exponential
forgetting where previous tasks are ignored.
References
See [1]_ for a full description of the progressive architecture
Only listing parameters specific to progressive networks here.
Parameters:
n_tasks (int) – Number of tasks
n_features (int) – Number of input features
alpha_init_stddevs (list) – List of standard-deviations for alpha in adapter layers.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length
of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer.
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1.
The final element corresponds to the output layer. Alternatively this may be a single value instead of a list,
in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer.
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Returns:
The average loss over the most recent checkpoint interval
This model implements the Weave style graph convolutions
from [1]_.
The biggest difference between WeaveModel style convolutions
and GraphConvModel style convolutions is that Weave
convolutions model bond features explicitly. This has the
side effect that it needs to construct a NxN matrix
explicitly to model bond interactions. This may cause
scaling issues, but may possibly allow for better modeling
of subtle bond effects.
Note that [1]_ introduces a whole variety of different architectures for
Weave models. The default settings in this class correspond to the W2N2
variant from [1]_ which is the most commonly used variant..
Examples
Here’s an example of how to fit a WeaveModel on a tiny sample dataset.
In general, the use of batch normalization can cause issues with NaNs. If
you’re having trouble with NaNs while using this model, consider setting
batch_normalize_kwargs={“trainable”: False} or turning off batch
normalization entirely with batch_normalize=False.
n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78
if chirality is used by WeaveFeaturizer.
n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.
n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)
n_weave (int, optional (default 2)) – The number of weave layers in this model.
fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of
this list determines the number of layers.
conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight
initialization of each convolutional layer. The length of this lisst
should equal n_weave. Alternatively, this may be a single value instead
of a list, in which case the same value is used for each layer.
weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight
initialization of each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The
length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in
which case the same value is used for every layer.
weight_decay_penalty (float (default 0.0)) – The magnitude of the weight decay penalty to use
weight_decay_penalty_type (str (default "l2")) – The type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
final_conv_activation_fn (Optional[ActivationFn] (default tf.nn.tanh)) – The Tensorflow activation funcntion to apply to the final
convolution at the end of the weave convolutions. If None, then no
activate is applied (hence linear).
activation_fns (list or object (default tf.nn.relu)) – The Tensorflow activation function to apply to each fully connected layer. The length
of this list should equal len(layer_sizes). Alternatively this may be a
single value instead of a list, in which case the same value is used for
every layer.
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying
activation functions on convolutional and fully connected layers.
batch_normalize_kwargs (Dict, optional (default {“renorm”=True, “fused”: False})) – Batch normalization is a complex layer which has many potential
argumentswhich change behavior. This layer accepts user-defined
parameters which are passed to all BatchNormalization layers in
WeaveModel, WeaveLayer, and WeaveGather.
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian
histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the
original dimensions of the input.
mode (str (default "classification")) – Either “classification” or “regression” for type of model.
n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)
batch_size (int (default 100)) – Batch size used by this model for training.
Compute tensors that will be input into the model from featurized representation.
The featurized input to WeaveModel is instances of WeaveMol created by
WeaveFeaturizer. This method converts input WeaveMol objects into
tensors used by the Keras implementation to compute WeaveModel outputs.
Parameters:
X_b (np.ndarray) – A numpy array with dtype=object where elements are WeaveMol objects.
Returns:
atom_feat (np.ndarray) – Of shape (N_atoms, N_atom_feat).
pair_feat (np.ndarray) – Of shape (N_pairs, N_pair_feat). Note that N_pairs will depend on
the number of pairs being considered. If max_pair_distance is
None, then this will be N_atoms**2. Else it will be the number
of pairs within the specifed graph distance.
pair_split (np.ndarray) – Of shape (N_pairs,). The i-th entry in this array will tell you the
originating atom for this pair (the “source”). Note that pairs are
symmetric so for a pair (a, b), both a and b will separately be
sources at different points in this array.
atom_split (np.ndarray) – Of shape (N_atoms,). The i-th entry in this array will be the molecule
with the i-th atom belongs to.
atom_to_pair (np.ndarray) – Of shape (N_pairs, 2). The i-th row in this array will be the array
[a, b] if (a, b) is a pair to be considered. (Note by symmetry, this
implies some other row will contain [b, a].
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Directed Acyclic Graph models for molecular property prediction.
This model is based on the following paper:
Lusci, Alessandro, Gianluca Pollastri, and Pierre Baldi. “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.” Journal of chemical information and modeling 53.7 (2013): 1563-1575.
The basic idea for this paper is that a molecule is usually
viewed as an undirected graph. However, you can convert it to
a series of directed graphs. The idea is that for each atom,
you make a DAG using that atom as the vertex of the DAG and
edges pointing “inwards” to it. This transformation is
implemented in
dc.trans.transformers.DAGTransformer.UG_to_DAG.
This model accepts ConvMols as input, just as GraphConvModel
does, but these ConvMol objects must be transformed by
dc.trans.DAGTransformer.
As a note, performance of this model can be a little
sensitive to initialization. It might be worth training a few
different instantiations to get a stable set of parameters.
max_atoms (int, optional) – Maximum number of atoms in a molecule, should be defined based on dataset.
n_atom_feat (int, optional) – Number of features per atom.
n_graph_feat (int, optional) – Number of features for atom in the graph.
n_outputs (int, optional) – Number of features for each molecule.
layer_sizes (list of int, optional) – List of hidden layer size(s) in the propagation step:
length of this list represents the number of hidden layers,
and each element is the width of corresponding hidden layer.
layer_sizes_gather (list of int, optional) – List of hidden layer size(s) in the gather step.
dropout (None or float, optional) – Dropout probability, applied after each propagation step and gather step.
mode (str, optional) – Either “classification” or “regression” for type of model.
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty
in outputs to be predicted
This class implements the graph convolutional model from the
following paper [1]_. These graph convolutions start with a per-atom set of
descriptors for each atom in a molecule, then combine and recombine these
descriptors over convolutional layers.
following [1]_.
Note that since the underlying _GraphConvKerasModel class is
specified using imperative subclassing style, this model
cannout make predictions for arbitrary outputs.
Parameters:
n_tasks (int) – Number of tasks
graph_conv_layers (list of int) – Width of channels for the Graph Convolution Layers
dense_layer_size (int) – Width of channels for Atom Level Dense Layer after GraphPool
dropout (list or float) – the dropout probablity to use for each layer. The length of this list
should equal len(graph_conv_layers)+1 (one value for each convolution
layer, and one for the dense layer). Alternatively this may be a single
value instead of a list, in which case the same value is used for every
layer.
mode (str) – Either “classification” or “regression”
number_atom_features (int) – 75 is the default number of atom features created, but
this can vary if various options are passed to the
function atom_features in graph_features
n_classes (int) – the number of classes to predict (only used in classification mode)
batch_normalize (True) – if True, apply batch normalization to model
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty
in outputs to be predicted
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Message Passing Neural Networks [1]_ treat graph convolutional
operations as an instantiation of a more general message
passing schem. Recall that message passing in a graph is when
nodes in a graph send each other “messages” and update their
internal state as a consequence of these messages.
Ordering structures in this model are built according to [2]_
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Model for de-novo generation of small molecules based on work of Nicola De Cao et al. [1]_.
It uses a GAN directly on graph data and a reinforcement learning objective to induce the network to generate molecules with certain chemical properties.
Utilizes WGAN infrastructure; uses adjacency matrix and node features as inputs.
Inputs need to be one-hot representation.
Examples
>>>>> import deepchem as dc>> from deepchem.models import BasicMolGANModel as MolGAN>> from deepchem.models.optimizers import ExponentialDecay>> from tensorflow import one_hot>> smiles = ['CCC', 'C1=CC=CC=C1', 'CNC' ]>> # create featurizer>> feat = dc.feat.MolGanFeaturizer()>> # featurize molecules>> features = feat.featurize(smiles)>> # Remove empty objects>> features = list(filter(lambda x: x is not None, features))>> # create model>> gan = MolGAN(learning_rate=ExponentialDecay(0.001, 0.9, 5000))>> dataset = dc.data.NumpyDataset([x.adjacency_matrix for x in features],[x.node_features for x in features])>> def iterbatches(epochs):>> for i in range(epochs):>> for batch in dataset.iterbatches(batch_size=gan.batch_size, pad_batches=True):>> adjacency_tensor = one_hot(batch[0], gan.edges)>> node_tensor = one_hot(batch[1], gan.nodes)>> yield {gan.data_inputs[0]: adjacency_tensor, gan.data_inputs[1]:node_tensor}>> gan.fit_gan(iterbatches(8), generator_steps=0.2, checkpoint_interval=5000)>> generated_data = gan.predict_gan_generator(1000)>> # convert graphs to RDKitmolecules>> nmols = feat.defeaturize(generated_data)>> print("{} molecules generated".format(len(nmols)))>> # remove invalid moles>> nmols = list(filter(lambda x: x is not None, nmols))>> # currently training is unstable so 0 is a common outcome>> print ("{} valid molecules".format(len(nmols)))
Create generator model.
Take noise data as an input and processes it through number of
dense and dropout layers. Then data is converted into two forms
one used for training and other for generation of compounds.
The model has two outputs:
edges
nodes
The format differs depending on intended use (training or sample generation).
For sample generation use flag, sample_generation=True while calling generator
i.e. gan.generators[0](noise_input, training=False, sample_generation=True).
For training the model, set sample_generation=False
batch_size (int) – the number of samples to generate. If either noise_input or
conditional_inputs is specified, this argument is ignored since the batch
size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default),
get_noise_batch() is called to generate a random input, so each call will
produce a new set of samples.
conditional_inputs (list of arrays) – NOT USED.
the values to use for all conditional inputs. This must be specified if
the GAN has any conditional inputs.
generator_index (int) – NOT USED.
the index of the generator (between 0 and n_generators-1) to use for
generating the samples.
Returns:
Returns a list of GraphMatrix object that can be converted into
RDKit molecules using MolGANFeaturizer defeaturize function.
The SCScore model is a neural network model based on the work of Coley et al. [1]_ that predicts the synthetic complexity score (SCScore) of molecules and correlates it with the expected number of reaction steps required to produce the given target molecule.
It is trained on a dataset of over 12 million reactions from the Reaxys database to impose a pairwise inequality constraint enforcing that on average the products of published chemical reactions should be more synthetically complex than their corresponding reactants.
The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.
The SCScore model can accurately predict the synthetic complexity of a variety of molecules, including both drug-like and natural product molecules.
SCScore has the potential to be a valuable tool for chemists who are working on drug discovery and other areas of chemistry.
The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.
Our model uses hingeloss instead of the shifted relu loss as in the supplementary material [2]_ provided by the author.
This could cause differentiation issues with compounds that are “close” to each other in “complexity”.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Implements sequence to sequence translation models.
The model is based on the description in Sutskever et al., “Sequence to
Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215),
although this implementation uses GRUs instead of LSTMs. The goal is to
take sequences of tokens as input, and translate each one into a different
output sequence. The input and output sequences can both be of variable
length, and an output sequence need not have the same length as the input
sequence it was generated from. For example, these models were originally
developed for use in natural language processing. In that context, the
input might be a sequence of English words, and the output might be a
sequence of French words. The goal would be to train the model to translate
sentences from English to French.
The model consists of two parts called the “encoder” and “decoder”. Each one
consists of a stack of recurrent layers. The job of the encoder is to
transform the input sequence into a single, fixed length vector called the
“embedding”. That vector contains all relevant information from the input
sequence. The decoder then transforms the embedding vector into the output
sequence.
These models can be used for various purposes. First and most obviously,
they can be used for sequence to sequence translation. In any case where you
have sequences of tokens, and you want to translate each one into a different
sequence, a SeqToSeq model can be trained to perform the translation.
Another possible use case is transforming variable length sequences into
fixed length vectors. Many types of models require their inputs to have a
fixed shape, which makes it difficult to use them with variable sized inputs
(for example, when the input is a molecule, and different molecules have
different numbers of atoms). In that case, you can train a SeqToSeq model as
an autoencoder, so that it tries to make the output sequence identical to the
input one. That forces the embedding vector to contain all information from
the original sequence. You can then use the encoder for transforming
sequences into fixed length embedding vectors, suitable to use as inputs to
other types of models.
Another use case is to train the decoder for use as a generative model. Here
again you begin by training the SeqToSeq model as an autoencoder. Once
training is complete, you can supply arbitrary embedding vectors, and
transform each one into an output sequence. When used in this way, you
typically train it as a variational autoencoder. This adds random noise to
the encoder, and also adds a constraint term to the loss that forces the
embedding vector to have a unit Gaussian distribution. You can then pick
random vectors from a Gaussian distribution, and the output sequences should
follow the same distribution as the training data.
When training as a variational autoencoder, it is best to use KL cost
annealing, as described in https://arxiv.org/abs/1511.06349. The constraint
term in the loss is initially set to 0, so the optimizer just tries to
minimize the reconstruction loss. Once it has made reasonable progress
toward that, the constraint term can be gradually turned back on. The range
of steps over which this happens is configurable.
In addition to the following arguments, this class also accepts all the keyword arguments
from TensorGraph.
Parameters:
input_tokens (list) – a list of all tokens that may appear in input sequences
output_tokens (list) – a list of all tokens that may appear in output sequences
max_output_length (int) – the maximum length of output sequence that may be generated
encoder_layers (int) – the number of recurrent layers in the encoder
decoder_layers (int) – the number of recurrent layers in the decoder
embedding_dimension (int) – the width of the embedding vector. This also is the width of all
recurrent layers.
dropout (float) – the dropout probability to use during training
reverse_input (bool) – if True, reverse the order of input sequences before sending them into
the encoder. This can improve performance when working with long sequences.
variational (bool) – if True, train the model as a variational autoencoder. This adds random
noise to the encoder, and also constrains the embedding to follow a unit
Gaussian distribution.
annealing_start_step (int) – the step (that is, batch) at which to begin turning on the constraint term
for KL cost annealing
annealing_final_step (int) – the step (that is, batch) at which to finish turning on the constraint term
for KL cost annealing
sequences (iterable) – the training samples to fit to. Each sample should be
represented as a tuple of the form (input_sequence, output_sequence).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
A Generative Adversarial Network (GAN) is a type of generative model. It
consists of two parts called the “generator” and the “discriminator”. The
generator takes random noise as input and transforms it into an output that
(hopefully) resembles the training data. The discriminator takes a set of
samples as input and tries to distinguish the real training samples from the
ones created by the generator. Both of them are trained together. The
discriminator tries to get better and better at telling real from false data,
while the generator tries to get better and better at fooling the discriminator.
In many cases there also are additional inputs to the generator and
discriminator. In that case it is known as a Conditional GAN (CGAN), since it
learns a distribution that is conditional on the values of those inputs. They
are referred to as “conditional inputs”.
Many variations on this idea have been proposed, and new varieties of GANs are
constantly being proposed. This class tries to make it very easy to implement
straightforward GANs of the most conventional types. At the same time, it
tries to be flexible enough that it can be used to implement many (but
certainly not all) variations on the concept.
To define a GAN, you must create a subclass that provides implementations of
the following methods:
If you want your GAN to have any conditional inputs you must also implement:
get_conditional_input_shapes()
The following methods have default implementations that are suitable for most
conventional GANs. You can override them if you want to customize their
behavior:
This class allows a GAN to have multiple generators and discriminators, a model
known as MIX+GAN. It is described in Arora et al., “Generalization and
Equilibrium in Generative Adversarial Nets (GANs)” (https://arxiv.org/abs/1703.00573).
This can lead to better models, and is especially useful for reducing mode
collapse, since different generators can learn different parts of the
distribution. To use this technique, simply specify the number of generators
and discriminators when calling the constructor. You can then tell
predict_gan_generator() which generator to use for predicting samples.
Get the shape of the generator’s noise input layer.
Subclasses must override this to return a tuple giving the shape of the
noise input. The actual Input layer will be created automatically. The
dimension corresponding to the batch size should be omitted.
Subclasses must override this to return a list of tuples, each giving the
shape of one of the inputs. The actual Input layers will be created
automatically. This list of shapes must also match the shapes of the
generator’s outputs. The dimension corresponding to the batch size should
be omitted.
Subclasses may override this to return a list of tuples, each giving the
shape of one of the conditional inputs. The actual Input layers will be
created automatically. The dimension corresponding to the batch size should
be omitted.
The default implementation returns an empty list, meaning there are no
conditional inputs.
Get a batch of random noise to pass to the generator.
This should return a NumPy array whose shape matches the one returned by
get_noise_input_shape(). The default implementation returns normally
distributed values. Subclasses can override this to implement a different
distribution.
Subclasses must override this to construct the generator. The returned
value should be a tf.keras.Model whose inputs are a batch of noise, followed
by any conditional inputs. The number and shapes of its outputs must match
the return value from get_data_input_shapes(), since generated data must
have the same form as training data.
Subclasses must override this to construct the discriminator. The returned
value should be a tf.keras.Model whose inputs are all data inputs, followed
by any conditional inputs. Its output should be a one dimensional tensor
containing the probability of each sample being a training sample.
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Return type:
A Tensor equal to the loss function to use for optimizing the generator.
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is
its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Return type:
A Tensor equal to the loss function to use for optimizing the discriminator.
batches (iterable) – batches of data to train the discriminator on, each represented as a dict
that maps Inputs to values. It should specify values for all members of
data_inputs and conditional_inputs.
generator_steps (float) – the number of training steps to perform for the generator for each batch.
This can be used to adjust the ratio of training steps for the generator
and discriminator. For example, 2.0 will perform two training steps for
every batch, while 0.5 will only perform one training step for every two
batches.
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set
this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint before training
it.
batch_size (int) – the number of samples to generate. If either noise_input or
conditional_inputs is specified, this argument is ignored since the batch
size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default),
get_noise_batch() is called to generate a random input, so each call will
produce a new set of samples.
conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if
the GAN has any conditional inputs.
generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for
generating the samples.
Returns:
An array (if the generator has only one output) or list of arrays (if it has
multiple outputs) containing the generated samples.
This class implements Wasserstein Generative Adversarial Networks (WGANs) as
described in Arjovsky et al., “Wasserstein GAN” (https://arxiv.org/abs/1701.07875).
A WGAN is conceptually rather different from a conventional GAN, but in
practical terms very similar. It reinterprets the discriminator (often called
the “critic” in this context) as learning an approximation to the Earth Mover
distance between the training and generated distributions. The generator is
then trained to minimize that distance. In practice, this just means using
slightly different loss functions for training the generator and discriminator.
WGANs have theoretical advantages over conventional GANs, and they often work
better in practice. In addition, the discriminator’s loss function can be
directly interpreted as a measure of the quality of the model. That is an
advantage over conventional GANs, where the loss does not directly convey
information about the quality of the model.
The theory WGANs are based on requires the discriminator’s gradient to be
bounded. The original paper achieved this by clipping its weights. This
class instead does it by adding a penalty term to the discriminator’s loss, as
described in https://arxiv.org/abs/1704.00028. This is sometimes found to
produce better results.
There are a few other practical differences between GANs and WGANs. In a
conventional GAN, the discriminator’s output must be between 0 and 1 so it can
be interpreted as a probability. In a WGAN, it should produce an unbounded
output that can be interpreted as a distance.
When training a WGAN, you also should usually use a smaller value for
generator_steps. Conventional GANs rely on keeping the generator and
discriminator “in balance” with each other. If the discriminator ever gets
too good, it becomes impossible for the generator to fool it and training
stalls. WGANs do not have this problem, and in fact the better the
discriminator is, the easier it is for the generator to improve. It therefore
usually works best to perform several training steps on the discriminator for
each training step on the generator.
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Return type:
A Tensor equal to the loss function to use for optimizing the generator.
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is
its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Return type:
A Tensor equal to the loss function to use for optimizing the discriminator.
Reimplementation of the discriminator module in ORGAN [1]_ .
Originated from [2]_.
This model applies multiple 1D convolutional filters to
the padded strings, then max-over-time pooling is applied on
all filters, extracting one feature per filter. All
features are concatenated and transformed through several
hidden layers to form predictions.
This model is initially developed for sentence-level
classification tasks, with words represented as vectors. In
this implementation, SMILES strings are dissected into
characters and transformed to one-hot vectors in a similar
way. The model can be used for general molecular-level
classification or regression tasks. It is also used in the
ORGAN model as discriminator.
Training of the model only requires SMILES strings input,
all featurized datasets that include SMILES in the ids
attribute are accepted. PDBbind, QM7 and QM7b are not
supported. To use the model, build_char_dict should be
called first before defining the model to build character
dict of input dataset, example can be found in
examples/delaney/delaney_textcnn.py
Implements the atomic convolutional networks as introduced in
Gomes, Joseph, et al. “Atomic convolutional networks for predicting protein-ligand binding affinity.” arXiv preprint arXiv:1703.10603 (2017).
The atomic convolutional networks function as a variant of
graph convolutions. The difference is that the “graph” here is
the nearest neighbors graph in 3D space. The AtomicConvModel
leverages these connections in 3D space to train models that
learn to predict energetic state starting from the spatial
geometry of the model.
frag1_num_atoms (int) – Number of atoms in first fragment
frag2_num_atoms (int) – Number of atoms in sec
max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors
are spatial neighbors.
atom_types (list) – List of atoms recognized by model. Atoms are indicated by their
nuclear numbers.
radial (list) – Radial parameters used in the atomic convolution transformation.
layer_sizes (list) – the size of each dense layer in the network. The length of
this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight
initialization of each layer. The length of this list should
equal len(layer_sizes). Alternatively this may be a single
value instead of a list, in which case the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The
length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in
which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer.
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead
of a simple stack of dense layers.
learning_rate (float) – Learning rate for the model.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Implements the Smiles2Vec model, that learns neural representations of SMILES
strings which can be used for downstream tasks.
The model is based on the description in Goh et al., “SMILES2vec: An
Interpretable General-Purpose Deep Neural Network for Predicting Chemical
Properties” (https://arxiv.org/pdf/1712.02034.pdf). The goal here is to take
SMILES strings as inputs, turn them into vector representations which can then
be used in predicting molecular properties.
The model consists of an Embedding layer that retrieves embeddings for each
character in the SMILES string. These embeddings are learnt jointly with the
rest of the model. The output from the embedding layer is a tensor of shape
(batch_size, seq_len, embedding_dim). This tensor can optionally be fed
through a 1D convolutional layer, before being passed to a series of RNN cells
(optionally bidirectional). The final output from the RNN cells aims
to have learnt the temporal dependencies in the SMILES string, and in turn
information about the structure of the molecule, which is then used for
molecular property prediction.
In the paper, the authors also train an explanation mask to endow the model
with interpretability and gain insights into its decision making. This segment
is currently not a part of this implementation as this was
developed for the purpose of investigating a transfer learning protocol,
ChemNet (which can be found at https://arxiv.org/abs/1712.02734).
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Implements the ChemCeption model that leverages the representational capacities
of convolutional neural networks (CNNs) to predict molecular properties.
The model is based on the description in Goh et al., “Chemception: A Deep
Neural Network with Minimal Chemistry Knowledge Matches the Performance of
Expert-developed QSAR/QSPR Models” (https://arxiv.org/pdf/1706.06689.pdf).
The authors use an image based representation of the molecule, where pixels
encode different atomic and bond properties. More details on the image repres-
entations can be found at https://arxiv.org/abs/1710.02238
The model consists of a Stem Layer that reduces the image resolution for the
layers to follow. The output of the Stem Layer is followed by a series of
Inception-Resnet blocks & a Reduction layer. Layers in the Inception-Resnet
blocks process image tensors at multiple resolutions and use a ResNet style
skip-connection, combining features from different resolutions. The Reduction
layers reduce the spatial extent of the image by max-pooling and 2-strided
convolutions. More details on these layers can be found in the ChemCeption
paper referenced above. The output of the final Reduction layer is subject to
a Global Average Pooling, and a fully-connected layer maps the features to
downstream outputs.
In the ChemCeption paper, the authors perform real-time image augmentation by
rotating images between 0 to 180 degrees. This can be done during model
training by setting the augment argument to True.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
The purpose of a normalizing flow is to map a simple distribution (that is
easy to sample from and evaluate probability densities for) to a more
complex distribution that is learned from data. Normalizing flows combine the
advantages of autoregressive models (which provide likelihood estimation
but do not learn features) and variational autoencoders (which learn feature
representations but do not provide marginal likelihoods). They are effective
for any application requiring a probabilistic model with these capabilities, e.g. generative modeling, unsupervised learning, or probabilistic inference.
A base distribution and normalizing flow for applying transformations.
Normalizing flows are effective for any application requiring
a probabilistic model that can both sample from a distribution and
compute marginal likelihoods, e.g. generative modeling,
unsupervised learning, or probabilistic inference. For a thorough review
of normalizing flows, see [1]_.
A distribution implements two main operations:
Sampling from the transformed distribution
Calculating log probabilities
A normalizing flow implements three main operations:
Forward transformation
Inverse transformation
Calculating the Jacobian
Deep Normalizing Flow models require normalizing flow layers where
input and output dimensions are the same, the transformation is invertible,
and the determinant of the Jacobian is efficient to compute and
differentiable. The determinant of the Jacobian of the transformation
gives a factor that preserves the probability volume to 1 when transforming
between probability densities of different random variables.
The loss function for a model can be defined in two different
ways. For models that have only a single output and use a
standard loss function, you can simply provide a
dc.models.losses.Loss object. This defines the loss for each
sample or sample/task pair. The result is automatically
multiplied by the weights and averaged over the batch.
For more complicated cases, you can instead provide a function
that directly computes the total loss. It must be of the form
f(outputs, labels, weights), taking the list of outputs from
the model, the expected values, and any weight matrices. It
should return a scalar equal to the value of the loss function
for the batch. No additional processing is done to the
result; it is up to you to do any weighting, averaging, adding
of penalty terms, etc.
You can optionally provide an output_types argument, which
describes how to interpret the model’s outputs. This should
be a list of strings, one for each output. You can use an
arbitrary output_type for a output, but some output_types are
special and will undergo extra processing:
‘prediction’: This is a normal output, and will be returned by predict().
If output types are not specified, all outputs are assumed
to be of this type.
‘loss’: This output will be used in place of the normal
outputs for computing the loss function. For example,
models that output probability distributions usually do it
by computing unbounded numbers (the logits), then passing
them through a softmax function to turn them into
probabilities. When computing the cross entropy, it is more
numerically stable to use the logits directly rather than
the probabilities. You can do this by having the model
produce both probabilities and logits as outputs, then
specifying output_types=[‘prediction’, ‘loss’]. When
predict() is called, only the first output (the
probabilities) will be returned. But during training, it is
the second output (the logits) that will be passed to the
loss function.
‘variance’: This output is used for estimating the
uncertainty in another output. To create a model that can
estimate uncertainty, there must be the same number of
‘prediction’ and ‘variance’ outputs. Each variance output
must have the same shape as the corresponding prediction
output, and each element is an estimate of the variance in
the corresponding prediction. Also be aware that if a model
supports uncertainty, it MUST use dropout on every layer,
and dropout most be enabled during uncertainty prediction.
Otherwise, the uncertainties it computes will be inaccurate.
other: Arbitrary output_types can be used to extract outputs
produced by the model, but will have no additional
processing performed.
model (torch.nn.Module) – the PyTorch model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each
batch, as described above
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None,
a temporary directory is created.
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is
ignored.
tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training
wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using
logging by default. If tensorboard is set, data is also
logged to TensorBoard. If wandb is set, data is also logged
to Weights & Biases. Logging happens at global steps. Roughly,
a global step corresponds to one batch of training. If you’d
like a printout every 10 batch steps, you’d set
log_frequency=10 for example.
device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is
chosen automatically.
regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add
to the loss function
wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter or torch.nn.ParameterList) – the variables to train. If None (the default), all trainable variables in
the model are used.
ParameterList can be used like a regular Python list, but Tensors that are
Parameter are properly registered, and will be visible by all Module methods.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
checkpoint (bool) – if true, save a checkpoint after performing the training step
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns – a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs
Generates predictions for input samples, processing samples in a batch.
Parameters:
X (ndarray) – the input data, as a Numpy array.
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977.
It involves repeating the prediction many times with different dropout masks.
The prediction is computed as the average over all the predictions. The
uncertainty includes both the variation among the predicted values (epistemic
uncertainty) and the model’s own estimates for how well it fits the data
(aleatoric uncertainty). Not all models support uncertainty prediction.
Parameters:
X (ndarray) – the input data, as a Numpy array.
masks (int) – the number of dropout masks to average over
Returns:
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
Uses self to make predictions on provided Dataset object.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Predicts embeddings created by underlying model if any exist.
An embedding must be specified to have output_type of
‘embedding’ in the model definition.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
Returns:
a NumPy array of the embeddings model produces, or a list
Predict the model’s outputs, along with the uncertainty in each one.
The uncertainty is computed as described in https://arxiv.org/abs/1703.04977.
It involves repeating the prediction many times with different dropout masks.
The prediction is computed as the average over all the predictions. The
uncertainty includes both the variation among the predicted values (epistemic
uncertainty) and the model’s own estimates for how well it fits the data
(aleatoric uncertainty). Not all models support uncertainty prediction.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
masks (int) – the number of dropout masks to average over
Returns:
for each output, a tuple (y_pred, y_std) where y_pred is the predicted
value of the output, and each element of y_std estimates the standard
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
per_task_metrics (bool) – If True, return per-task scores.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Usually you do not need to call this method, since fit() saves checkpoints
automatically. If you have disabled automatic checkpointing during fitting,
this can be called to manually write checkpoints.
Parameters:
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
If set to zero, the function will simply return as no checkpoint is saved.
model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir
Reload the values of all variables from a checkpoint file.
Parameters:
checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints.
model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If
checkpoint is not None, this is ignored.
strict (bool, default True) – Whether or not to strictly enforce that the keys in checkpoint match
the keys returned by this model’s get_variable_scope() method.
Copies parameter values from a pretrained model. source_model can either
be a pretrained model or a model with the same architecture. value_map
is a parameter-value dictionary. If no value_map is provided, the parameter
values are restored to the source_model from a checkpoint and a default
value_map is created. assignment_map is a dictionary mapping parameters
from the source_model to the current model. If no assignment_map is
provided, one is made from scratch and assumes the model is composed of
several different layers, with the final one being a dense layer. include_top
is used to control whether or not the final dense layer is used. The default
assignment map is useful in cases where the type of task is different
(classification vs regression) and/or number of tasks in the setting.
Parameters:
source_model (dc.TorchModel, required) – source_model can either be the pretrained model or a dc.TorchModel with
the same architecture as the pretrained model. It is used to restore from
a checkpoint, if value_map is None and to create a default assignment map
if assignment_map is None
assignment_map (Dict, default None) – Dictionary mapping the source_model parameters and current model parameters
value_map (Dict, default None) – Dictionary containing source_model trainable parameters mapped to numpy
arrays. If value_map is None, the values are restored and a default
parameter map is created using the restored values
checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints
model_dir (str, default None) – Restore source model from custom model directory if needed
include_top (bool, default True) – if True, copies the weights and bias associated with the final dense
layer. Used only when assignment map is None
inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.
ModularTorchModel is a subclass of TorchModel that allows for components to be
pretrained and then combined into a final model. It is designed to be subclassed
for specific models and is not intended to be used directly. There are 3 main differences
between ModularTorchModel and TorchModel:
The build_components() method is used to define the components of the model.
The components are combined into a final model with the build_model() method.
The loss function is defined with the loss_func method. This may access the
components to compute the loss using intermediate values from the network, rather
than just the full forward pass output.
Here is an example of how to use ModularTorchModel to pretrain a linear layer, load
it into another network and then finetune that network:
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
Defines the loss function for the model which can access the components
using self.components. The loss function should take the inputs, labels, and
weights as arguments and return the loss.
Train this model on data from a generator. This method is similar to
the TorchModel implementation, but it passes the inputs directly to the
loss function, rather than passing them through the model first. This
enables the loss to be calculated from intermediate steps of the model
and not just the final output.
Parameters:
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
Copies parameter values from a pretrained model. The pretrained model can be loaded as a source_model (ModularTorchModel object), checkpoint (pytorch .ckpt file) or a model_dir (directory with .ckpt files).
Specific components can be chosen by passing a list of strings with the desired component names. If both a source_model and a checkpoint/model_dir are loaded, the source_model weights will be loaded.
Parameters:
source_model (dc.ModularTorchModel, required) – source_model can either be the pretrained model or a dc.TorchModel with
the same architecture as the pretrained model. It is used to restore from
a checkpoint, if value_map is None and to create a default assignment map
if assignment_map is None
checkpoint (str, default None) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints
model_dir (str, default None) – Restore source model from custom model directory if needed
inputs (List, input tensors for model) – if not None, then the weights are built for both the source and self.
Saves the current state of the model and its components as a checkpoint file in the specified model directory.
It maintains a maximum number of checkpoint files, deleting the oldest one when the limit is reached.
Parameters:
max_checkpoints_to_keep (int, default 5) – Maximum number of checkpoint files to keep.
model_dir (str, default None) – The directory to save the checkpoint file in. If None, the model_dir specified in the constructor is used.
Restores the state of a ModularTorchModel from a checkpoint file.
If no checkpoint file is provided, it will use the latest checkpoint found in the model directory. If a list of component names is provided, only the state of those components will be restored.
Parameters:
components (Optional[List[str]]) – A list of component names to restore. If None, all components will be restored.
checkpoint (Optional[str]) – The path to the checkpoint file. If None, the latest checkpoint in the model directory will
be used.
model_dir (Optional[str]) – The path to the model directory. If None, the model directory used to initialize the model will be used.
A 1, 2, or 3 dimensional convolutional network for either regression or classification.
The network consists of the following sequence of layers:
A configurable number of convolutional layers
A global pooling layer (either max pool or average pool)
A final fully connected layer to compute the output
It optionally can compose the model from pre-activation residual blocks, as
described in https://arxiv.org/abs/1603.05027, rather than a simple stack of
convolution layers. This often leads to easier training, especially when using a
large number of layers. Note that residual blocks can only be used when
successive layers have the same output shape. Wherever the output shape changes, a
simple convolution layer will be used even if residual=True.
dims (int) – the number of dimensions to apply convolutions over (1, 2, or 3)
layer_filters (list) – the number of output filters for each convolutional layer in the network.
The length of this list determines the number of layers.
kernel_size (int, tuple, or list) – a list giving the shape of the convolutional kernel for each layer. Each
element may be either an int (use the same kernel width for every dimension)
or a tuple (the kernel width along each dimension). Alternatively this may
be a single int or tuple instead of a list, in which case the same kernel
shape is used for every layer.
strides (int, tuple, or list) – a list giving the stride between applications of the kernel for each layer.
Each element may be either an int (use the same stride for every dimension)
or a tuple (the stride along each dimension). Alternatively this may be a
single int or tuple instead of a list, in which case the same stride is
used for every layer.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization
of each layer. The length of this list should equal len(layer_filters)+1,
where the final element corresponds to the dense layer. Alternatively this
may be a single value instead of a list, in which case the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this
list should equal len(layer_filters)+1, where the final element corresponds
to the dense layer. Alternatively this may be a single value instead of a
list, in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_filters).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer
activation_fns (str or list) – the torch activation function to apply to each layer. The length of this list should equal
len(layer_filters). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer, ‘relu’ by default
pool_type (str) – the type of pooling layer to use, either ‘max’ or ‘average’
mode (str) – Either ‘classification’ or ‘regression’
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty
in outputs to be predicted
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead
of a simple stack of convolutional layers.
padding (str, int or tuple) – the padding to use for convolutional layers, either ‘valid’ or ‘same’
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
A fully connected network for multitask regression.
This class provides lots of options for customizing aspects of the model: the
number and widths of layers, the activation functions, regularization methods,
etc.
It optionally can compose the model from pre-activation residual blocks, as
described in https://arxiv.org/abs/1603.05027, rather than a simple stack of
dense layers. This often leads to easier training, especially when using a
large number of layers. Note that residual blocks can only be used when
successive layers have the same width. Wherever the layer width changes, a
simple dense layer will be used even if residual=True.
In addition to the following arguments, this class also accepts all the keywork arguments
from TensorGraph.
Parameters:
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length
of this list should equal len(layer_sizes)+1. The final element corresponds to the output layer.
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_sizes)+1.
The final element corresponds to the output layer. Alternatively this may be a single value instead of a list,
in which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty
in outputs to be predicted
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead
of a simple stack of dense layers.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns – a NumPy array of the model produces a single output, or a list of arrays
if it produces multiple outputs
A fully connected network for multitask classification.
This class provides lots of options for customizing aspects of the model: the
number and widths of layers, the activation functions, regularization methods,
etc.
It optionally can compose the model from pre-activation residual blocks, as
described in https://arxiv.org/abs/1603.05027, rather than a simple stack of
dense layers. This often leads to easier training, especially when using a
large number of layers. Note that residual blocks can only be used when
successive layers have the same width. Wherever the layer width changes, a
simple dense layer will be used even if residual=True.
In addition to the following arguments, this class also accepts
all the keyword arguments from TensorGraph.
Parameters:
n_tasks (int) – number of tasks
n_features (int) – number of features
layer_sizes (list) – the size of each dense layer in the network. The length of
this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight
initialization of each layer. The length of this list should
equal len(layer_sizes). Alternatively this may be a single
value instead of a list, in which case the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The
length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in
which case the same value is used for every layer.
weight_decay_penalty (float) – the magnitude of the weight decay penalty to use
weight_decay_penalty_type (str) – the type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float) – the dropout probablity to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
activation_fns (list or object) – the PyTorch activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer. Standard activation functions from torch.nn.functional can be specified by name.
n_classes (int) – the number of classes
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead
of a simple stack of dense layers.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
This model takes arbitary crystal structures as an input, and predict material properties
using the element information and connection of atoms in the crystal. If you want to get
some material properties which has a high computational cost like band gap in the case
of DFT, this model may be useful. This model is one of variants of Graph Convolutional
Networks. The main differences between other GCN models are how to construct graphs and
how to update node representations. This model defines the crystal graph from structures
using distances between atoms. The crystal graph is an undirected multigraph which is defined
by nodes representing atom properties and edges representing connections between atoms
in a crystal. And, this model updates the node representations using both neighbor node
and edge representations. Please confirm the detail algorithms from [1]_.
References
Notes
This class requires DGL and PyTorch to be installed.
Model for Graph Property Prediction Based on Graph Attention Networks (GAT).
This model proceeds as follows:
Update node representations in graphs with a variant of GAT
For each graph, compute its representation by 1) a weighted sum of the node
representations in the graph, where the weights are computed by applying a
gating function to the node representations 2) a max pooling of the node
representations 3) concatenating the output of 1) and 2)
graph_attention_layers (list of int) – Width of channels per attention head for GAT layers. graph_attention_layers[i]
gives the width of channel for each attention head for the i-th GAT layer. If
both graph_attention_layers and agg_modes are specified, they should have
equal length. If not specified, the default value will be [8, 8].
n_attention_heads (int) – Number of attention heads in each GAT layer.
agg_modes (list of str) – The way to aggregate multi-head attention results for each GAT layer, which can be
either ‘flatten’ for concatenating all-head results or ‘mean’ for averaging all-head
results. agg_modes[i] gives the way to aggregate multi-head attention results for
the i-th GAT layer. If both graph_attention_layers and agg_modes are
specified, they should have equal length. If not specified, the model will flatten
multi-head results for intermediate GAT layers and compute mean of multi-head results
for the last GAT layer.
activation (activation function or None) – The activation function to apply to the aggregated multi-head results for each GAT
layer. If not specified, the default value will be ELU.
residual (bool) – Whether to add a residual connection within each GAT layer. Default to True.
dropout (float) – The dropout probability within each GAT layer. Default to 0.
alpha (float) – A hyperparameter in LeakyReLU, which is the slope for negative values. Default to 0.2.
predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.
predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
n_classes (int) – The number of classes to predict per task
(only used when mode is ‘classification’). Default to 2.
self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves.
When input graphs have isolated nodes, self loops allow preserving the original feature
of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
Model for Graph Property Prediction Based on Graph Convolution Networks (GCN).
This model proceeds as follows:
Update node representations in graphs with a variant of GCN
For each graph, compute its representation by 1) a weighted sum of the node
representations in the graph, where the weights are computed by applying a
gating function to the node representations 2) a max pooling of the node
representations 3) concatenating the output of 1) and 2)
This model is different from deepchem.models.GraphConvModel as follows:
For each graph convolution, the learnable weight in this model is shared across all nodes.
GraphConvModel employs separate learnable weights for nodes of different degrees. A
learnable weight is shared across all nodes of a particular degree.
For GraphConvModel, there is an additional GraphPool operation after each
graph convolution. The operation updates the representation of a node by applying an
element-wise maximum over the representations of its neighbors and itself.
For computing graph-level representations, this model computes a weighted sum and an
element-wise maximum of the representations of all nodes in a graph and concatenates them.
The node weights are obtained by using a linear/dense layer followd by a sigmoid function.
For GraphConvModel, the sum over node representations is unweighted.
There are various minor differences in using dropout, skip connection and batch
graph_conv_layers (list of int) – Width of channels for GCN layers. graph_conv_layers[i] gives the width of channel
for the i-th GCN layer. If not specified, the default value will be [64, 64].
activation (callable) – The activation function to apply to the output of each GCN layer.
By default, no activation function will be applied.
residual (bool) – Whether to add a residual connection within each GCN layer. Default to True.
batchnorm (bool) – Whether to apply batch normalization to the output of each GCN layer.
Default to False.
dropout (float) – The dropout probability for the output of each GCN layer. Default to 0.
predictor_hidden_feats (int) – The size for hidden representations in the output MLP predictor. Default to 128.
predictor_dropout (float) – The dropout probability in the output MLP predictor. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
n_classes (int) – The number of classes to predict per task
(only used when mode is ‘classification’). Default to 2.
self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves.
When input graphs have isolated nodes, self loops allow preserving the original feature
of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
num_layers (int) – Number of graph neural network layers, i.e. number of rounds of message passing.
Default to 2.
num_timesteps (int) – Number of time steps for updating graph representations with a GRU. Default to 2.
graph_feat_size (int) – Size for graph representations. Default to 200.
dropout (float) – Dropout probability. Default to 0.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.
n_classes (int) – The number of classes to predict per task
(only used when mode is ‘classification’). Default to 2.
self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves.
When input graphs have isolated nodes, self loops allow preserving the original feature
of them in message passing. Default to True.
kwargs – This can include any keyword argument of TorchModel.
An Atomic Convolutional Neural Network (ACNN) for energy score prediction.
The network follows the design of a graph convolutional network but in this case the graph is represented
as a 3D structure of the molecule. The objective of this model is to train models and predict energetic
state starting from the spatial geometry of the model [1].
References
Examples
>>> fromdeepchem.models.torch_modelsimportAtomConvModel>>> fromdeepchem.dataimportNumpyDataset>>> frag1_num_atoms=100# atoms for ligand>>> frag2_num_atoms=1200# atoms for protein>>> complex_num_atoms=frag1_num_atoms+frag2_num_atoms>>> batch_size=1>>> # Initialize the model>>> atomic_convnet=AtomConvModel(n_tasks=1,... batch_size=batch_size,... layer_sizes=[... 10,... ],... frag1_num_atoms=frag1_num_atoms,... frag2_num_atoms=frag2_num_atoms,... complex_num_atoms=complex_num_atoms)>>> # Creates a set of dummy features that contain the coordinate and>>> # neighbor-list features required by the AtomicConvModel.>>> # Preparing the dataset>>> features=[]>>> frag1_coords=np.random.rand(frag1_num_atoms,3)>>> frag1_nbr_list={i:[]foriinrange(frag1_num_atoms)}>>> frag1_z=np.random.randint(10,size=(frag1_num_atoms))>>> frag2_coords=np.random.rand(frag2_num_atoms,3)>>> frag2_nbr_list={i:[]foriinrange(frag2_num_atoms)}>>> frag2_z=np.random.randint(10,size=(frag2_num_atoms))>>> system_coords=np.random.rand(complex_num_atoms,3)>>> system_nbr_list={i:[]foriinrange(complex_num_atoms)}>>> system_z=np.random.randint(10,size=(complex_num_atoms))>>> features.append((frag1_coords,frag1_nbr_list,frag1_z,frag2_coords,frag2_nbr_list,frag2_z,system_coords,system_nbr_list,system_z))>>> features=np.asarray(features,dtype=object)>>> labels=np.zeros(batch_size)>>> train=NumpyDataset(features,labels)>>> _=atomic_convnet.fit(train,nb_epoch=1)>>> preds=atomic_convnet.predict(train)
frag1_num_atoms (int) – Number of atoms in first fragment.
frag2_num_atoms (int) – Number of atoms in second fragment.
complex_num_atoms (int) – Number of atoms in complex.
max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors
are spatial neighbors.
batch_size (int) – Size of the batch.
atom_types (list) – List of atoms recognized by model. Atoms are indicated by their
nuclear numbers.
radial (list) – Radial parameters used in the atomic convolution transformation.
layer_sizes (list) – the size of each dense layer in the network. The length of
this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight
initialization of each layer. The length of this list should
equal len(layer_sizes). Alternatively, this may be a single
value instead of a list, where the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer. The
length of this list should equal len(layer_sizes).
Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively, this may be a single value instead of a list, where the
same value is used for every layer.
residual (bool) – Whether to use residual connections.
learning_rate (float) – the learning rate to use for fitting.
node_out_feats (int) – The length of the final node representation vectors. Default to 64.
edge_hidden_feats (int) – The length of the hidden edge representation vectors. Default to 128.
num_step_message_passing (int) – The number of rounds of message passing. Default to 3.
num_step_set2set (int) – The number of set2set steps. Default to 6.
num_layer_set2set (int) – The number of set2set layers. Default to 3.
mode (str) – The model type, ‘classification’ or ‘regression’. Default to ‘regression’.
number_atom_features (int) – The length of the initial atom feature vectors. Default to 30.
number_bond_features (int) – The length of the initial bond feature vectors. Default to 11.
n_classes (int) – The number of classes to predict per task
(only used when mode is ‘classification’). Default to 2.
self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves.
Generally, an MPNNModel does not require self loops. Default to False.
kwargs – This can include any keyword argument of TorchModel.
InfoGraphModel is a model which learn graph-level representation via unsupervised learning. To this end,
the model aims to maximize the mutual information between the representations of entire graphs and the representations of substructures of different granularity (eg. nodes, edges, triangles)
The unsupervised training of InfoGraph involves two encoders: one that encodes the entire graph and another that encodes substructures of different sizes. The mutual information between the two encoder outputs is maximized using a contrastive loss function.
The model randomly samples pairs of graphs and substructures, and then maximizes their mutual information by minimizing their distance in a learned embedding space.
This can be used for downstream tasks such as graph classification and molecular property prediction.It is implemented as a ModularTorchModel in order to facilitate transfer learning.
References
Sun, F.-Y., Hoffmann, J., Verma, V. & Tang, J. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. Preprint at http://arxiv.org/abs/1908.01000 (2020).
Parameters:
num_features (int) – Number of node features for each input
edge_features (int) – Number of edge features for each input
embedding_dim (int) – Dimension of the embedding
num_gc_layers (int) – Number of graph convolutional layers
prior (bool) – Whether to use a prior expectation in the loss term
gamma (float) – Weight of the prior expectation in the loss term
measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’,
‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.
average_loss (bool) – Whether to average the loss over the batch
n_classes (int) – Number of classses
Example
>>> fromdeepchem.models.torch_models.infographimportInfoGraphModel>>> fromdeepchem.featimportMolGraphConvFeaturizer>>> fromdeepchem.dataimportNumpyDataset>>> importtorch>>> importnumpyasnp>>> importtempfile>>> tempdir=tempfile.TemporaryDirectory()>>> smiles=["C1CCC1","C1=CC=CN=C1"]>>> featurizer=MolGraphConvFeaturizer(use_edges=True)>>> X=featurizer.featurize(smiles)>>> y=torch.randint(0,2,size=(2,1)).float()>>> w=torch.ones(size=(2,1)).float()>>> dataset=NumpyDataset(X,y,w)>>> num_feat,edge_dim=30,11# num feat and edge dim by molgraph conv featurizer>>> pretrain_model=InfoGraphModel(num_feat,edge_dim,num_gc_layers=1,task='pretraining',model_dir=tempdir.name)>>> pretraining_loss=pretrain_model.fit(dataset,nb_epoch=1)>>> pretrain_model.save_checkpoint()>>> finetune_model=InfoGraphModel(num_feat,edge_dim,num_gc_layers=1,task='regression',n_tasks=1,model_dir=tempdir.name)>>> finetune_model.restore(components=['encoder'])>>> finetuning_loss=finetune_model.fit(dataset)>>>>>> # classification example>>> n_classes,n_tasks=2,1>>> classification_model=InfoGraphModel(num_feat,edge_dim,num_gc_layers=1,task='classification',n_tasks=1,n_classes=2)>>> y=np.random.randint(n_classes,size=(len(smiles),n_tasks)).astype(np.float64)>>> dataset=NumpyDataset(X,y,w)>>> loss=classification_model.fit(dataset,nb_epoch=1)
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
Build the components of the model. InfoGraph is an unsupervised molecular graph representation learning model. It consists of an encoder, a local discriminator, a global discriminator, and a prior discriminator.
The unsupervised loss is calculated by the mutual information in embedding representations at all layers.
local_d: MultilayerPerceptron, local discriminator
global_d: MultilayerPerceptron, global discriminator
prior_d: MultilayerPerceptron, prior discriminator
fc1: MultilayerPerceptron, dense layer used during finetuning
fc2: MultilayerPerceptron, dense layer used during finetuning
Defines the loss function for the model which can access the components
using self.components. The loss function should take the inputs, labels, and
weights as arguments and return the loss.
Restores the state of a ModularTorchModel from a checkpoint file.
If no checkpoint file is provided, it will use the latest checkpoint found in the model directory. If a list of component names is provided, only the state of those components will be restored.
Parameters:
components (Optional[List[str]]) – A list of component names to restore. If None, all components will be restored.
checkpoint (Optional[str]) – The path to the checkpoint file. If None, the latest checkpoint in the model directory will
be used.
model_dir (Optional[str]) – The path to the model directory. If None, the model directory used to initialize the model will be used.
InfographStar is a semi-supervised graph convolutional network for predicting molecular properties.
It aims to maximize the mutual information between the graph-level representation and the
representations of substructures of different scales. It does this by producing graph-level
encodings and substructure encodings, and then using a discriminator to classify if they
are from the same molecule or not.
Supervised training is done by using the graph-level encodings to predict the target property. Semi-supervised training is done by adding a loss term that maximizes the mutual information between the graph-level encodings and the substructure encodings to the supervised loss.
These modes can be chosen by setting the training_mode parameter.
To conduct training in unsupervised mode, use InfoGraphModel.
References
Parameters:
num_features (int) – Number of node features for each input
edge_features (int) – Number of edge features for each input
embedding_dim (int) – Dimension of the embedding
training_mode (str) – The mode to use for training. Options are ‘supervised’, ‘semisupervised’. For unsupervised training, use InfoGraphModel.
measure (str) – The divergence measure to use for the unsupervised loss. Options are ‘GAN’, ‘JSD’,
‘KL’, ‘RKL’, ‘X2’, ‘DV’, ‘H2’, or ‘W1’.
average_loss (bool) – Whether to average the loss over the batch
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
Builds the components of the InfoGraphStar model. InfoGraphStar works by maximizing the mutual information between the graph-level representation and the representations of substructures of different scales.
It does this by producing graph-level encodings and substructure encodings, and then using a discriminator to classify if they are from the same molecule or not.
The encoder is a graph convolutional network that produces the graph-level encodings and substructure encodings.
In a supervised training mode, only 1 encoder is used and the encodings are not compared, while in a semi-supvervised training mode they are different in order to prevent negative transfer from the pretraining stage.
The local discriminator is a multilayer perceptron that classifies if the substructure encodings are from the same molecule or not while the global discriminator classifies if the graph-level encodings are from the same molecule or not.
Defines the loss function for the model which can access the components
using self.components. The loss function should take the inputs, labels, and
weights as arguments and return the loss.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Modular GNN which allows for easy swapping of GNN layers.
Parameters:
gnn_type (str) – The type of GNN layer to use. Must be one of “gin”, “gcn”, “graphsage”, or “gat”.
num_layer (int) – The number of GNN layers to use.
emb_dim (int) – The dimensionality of the node embeddings.
num_tasks (int) – The number of tasks.
graph_pooling (str) – The type of graph pooling to use. Must be one of “sum”, “mean”, “max”, “attention” or “set2set”.
“sum” may cause issues with positive prediction loss.
dropout (float, optional (default 0)) – The dropout probability.
jump_knowledge (str, optional (default "last")) – The type of jump knowledge to use. [1] Must be one of “last”, “sum”, “max”, or “concat”.
“last”: Use the node representation from the last GNN layer.
“concat”: Concatenate the node representations from all GNN layers. This will increase the dimensionality of the node representations by a factor of num_layer.
“max”: Take the element-wise maximum of the node representations from all GNN layers.
“sum”: Take the element-wise sum of the node representations from all GNN layers. This may cause issues with positive prediction loss.
task (str, optional (default "regression")) – The type of task.
Unsupervised tasks:
edge_pred: Edge prediction. Predicts whether an edge exists between two nodes.
mask_nodes: Masking nodes. Predicts the masked node.
mask_edges: Masking edges. Predicts the masked edge.
infomax: Infomax. Maximizes mutual information between local node representations and a pooled global graph representation.
context_pred: Context prediction. Predicts the surrounding context of a node.
Supervised tasks:
“regression” or “classification”.
mask_rate (float, optional (default 0.1)) – The rate at which to mask nodes or edges for mask_nodes and mask_edges tasks.
mask_edge (bool, optional (default True)) – Whether to also mask connecting edges for mask_nodes tasks.
context_size (int, optional (default 1)) – The size of the context to use for context prediction tasks.
neighborhood_size (int, optional (default 3)) – The size of the neighborhood to use for context prediction tasks.
context_mode (str, optional (default "cbow")) – The context mode to use for context prediction tasks. Must be one of “cbow” or “skipgram”.
neg_samples (int, optional (default 1)) – The number of negative samples to use for context prediction.
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
Builds the components of the GNNModular model. It initializes the encoders, batch normalization layers, pooling layers, and head layers based on the provided configuration. The method returns a dictionary containing the following components:
node_type_embedding: torch.nn.Embedding, an embedding layer for node types.
chirality_embedding: torch.nn.Embedding, an embedding layer for chirality tags.
gconvs: torch_geometric.nn.conv.MessagePassing, a list of graph convolutional layers (encoders) based on the specified GNN type (GIN, GCN, or GAT).
batch_norms: torch.nn.BatchNorm1d, a list of batch normalization layers corresponding to the encoders.
pool: Union[function,torch_geometric.nn.aggr.Aggregation], a pooling layer based on the specified graph pooling type (sum, mean, max, attention, or set2set).
head: nn.Linear, a linear layer for the head of the model.
These components are then used to construct the GNN and GNN_head modules for the GNNModular model.
Build graph neural network encoding layers by specifying the number of GNN layers.
Parameters:
num_layer (int) – The number of GNN layers to be created.
Returns:
A tuple containing two ModuleLists:
1. encoders: A ModuleList of GNN layers (currently only GIN is supported).
2. batch_norms: A ModuleList of batch normalization layers corresponding to each GNN layer.
Return type:
tuple of (torch.nn.ModuleList, torch.nn.ModuleList)
Builds the appropriate model based on the specified task.
For the edge prediction task, the model is simply the GNN module because it is an unsupervised task and does not require a prediction head.
Supervised tasks such as node classification and graph regression require a prediction head, so the model is a sequential module consisting of the GNN module followed by the GNN_head module.
Produces the loss between the predicted node features and the true node features for masked nodes. Set mask_edge to True to also predict the edge types for masked edges.
Loss that maximizes mutual information between local node representations and a pooled global graph representation. The positive and negative scores represent the similarity between local node representations and global graph representations of simlar and dissimilar graphs, respectively.
Parameters:
inputs (BatchedGraphData) – BatchedGraphData object containing the node features, edge indices, and graph indices for the batch of graphs.
Loads the context prediction loss for the given input by taking the batched subgraph and context graphs and computing the context prediction loss for each subgraph and context graph pair.
Parameters:
inputs (tuple) – A tuple containing the following elements:
- substruct_batch (BatchedGraphData): Batched subgraph, or neighborhood, graphs.
- s_overlap (List[int]): List of overlapping subgraph node indices between the subgraph and context graphs.
- context_graphs (BatchedGraphData): Batched context graphs.
- c_overlap (List[int]): List of overlapping context node indices between the subgraph and context graphs.
- overlap_size (List[int]): List of the number of overlapping nodes between the subgraph and context graphs.
This default generator is modified from the default generator in dc.models.tensorgraph.tensor_graph.py to support multitask classification. If the task is classification, the labels y_b are converted to a one-hot encoding and reshaped according to the number of tasks and classes.
InfoMax3DModular is a modular torch model that uses a 2D PNA model and a 3D Net3D model to maximize the mutual information between their representations. The 2D model can then be used for downstream tasks without the need for 3D coordinates. This is based off the work in [1].
This class expects data in featurized by the RDKitConformerFeaturizer. This featurizer produces features of the type Array[Array[List[GraphData]]].
The outermost array is the dataset array, the second array is the molecule, the list contains the conformers for that molecule and the GraphData object is the featurized graph for that conformer with node_pos_features holding the 3D coordinates.
If you are not using RDKitConformerFeaturizer, your input data features should look like this: Dataset[Molecule[Conformers[GraphData]]].
For pretraining, the original paper used a learning rate of 8e-5 with a batch size of 500.
For finetuning on quantum mechanical datasets, a learning rate of 7e-5 with a batch size of 128
was used. For finetuning on non-quantum mechanical datasets, a learning rate of 1e-3 with a
batch size of 32 was used in the original implementation.
Parameters:
task (Literal['pretrain', 'regression', 'classification']) – The task of the model
hidden_dim (int, optional, default = 64) – The dimension of the hidden layers.
target_dim (int, optional, default = 10) – The dimension of the output layer.
aggregators (List[str]) – A list of aggregator functions for the PNA model. Options are ‘mean’, ‘sum’, ‘min’, ‘max’, ‘std’, ‘var’, ‘moment3’, ‘moment4’, ‘moment5’.
readout_aggregators (List[str]) – A list of aggregator functions for the readout layer. Options are ‘sum’, ‘max’, ‘min’, ‘mean’.
scalers (List[str]) – A list of scaler functions for the PNA model. Options are ‘identity’, ‘amplification’, ‘attenuation’.
residual (bool, optional (default=True)) – Whether to use residual connections in the PNA model.
node_wise_output_layers (int, optional (default=2)) – The number of output layers for each node in the Net3D model.
pairwise_distances (bool, optional (default=False)) – Whether to use pairwise distances in the PNA model.
activation (Union[Callable, str], optional (default="relu")) – The activation function to use in the PNA model.
reduce_func (str, optional (default='sum')) – The reduce function to use for aggregating messages in the Net3D model.
batch_norm (bool, optional (default=True)) – Whether to use batch normalization in the PNA model.
batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.
propagation_depth (int, optional (default=5)) – The number of propagation layers in the PNA and Net3D models.
dropout (float, optional (default=0.0)) – The dropout rate for the layers in the PNA and Net3D models.
readout_layers (int, optional (default=2)) – The number of readout layers in the PNA and Net3D models.
readout_hidden_dim (int, optional (default=None)) – The dimension of the hidden layers in the readout network.
fourier_encodings (int, optional (default=4)) – The number of Fourier encodings to use in the Net3D model.
update_net_layers (int, optional (default=2)) – The number of update network layers in the Net3D model.
message_net_layers (int, optional (default=2)) – The number of message network layers in the Net3D model.
use_node_features (bool, optional (default=False)) – Whether to use node features as input in the Net3D model.
posttrans_layers (int, optional (default=1)) – The number of post-transformation layers in the PNA model.
pretrans_layers (int, optional (default=1)) – The number of pre-transformation layers in the PNA model.
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
Lattice Convolutional Neural Network (LCNN).
Here is a simple example of code that uses the LCNNModel with
Platinum 2d Adsorption dataset.
This model takes arbitrary configurations of Molecules on an adsorbate and predicts
their formation energy. These formation energies are found using DFT calculations and
LCNNModel is to automate that process. This model defines a crystal graph using the
distance between atoms. The crystal graph is an undirected regular graph (equal neighbours)
and different permutations of the neighbours are pre-computed using the LCNNFeaturizer.
On each node for each permutation, the neighbour nodes are concatenated which are further operated.
This model has only a node representation. Please confirm the detail algorithms from [1]_.
MatErials Graph Network for Molecules and Crystals
MatErials Graph Network [1]_ are Graph Networks [2]_ which are used for property prediction
in molecules and crystals. The model implements multiple layers of Graph Network as
MEGNetBlocks and then combines the node properties and edge properties of all nodes
and edges via a Set2Set layer. The combines information is used with the global
features of the material/molecule for property prediction tasks.
This class implements the Molecular Attention Transformer [1]_.
The MATFeaturizer (deepchem.feat.MATFeaturizer) is intended to work with this class.
The model takes a batch of MATEncodings (from MATFeaturizer) as input, and returns an array of size Nx1, where N is the number of molecules in the batch.
Each molecule is broken down into its Node Features matrix, adjacency matrix and distance matrix.
A mask tensor is calculated for the batch. All of this goes as input to the MATEmbedding, MATEncoder and MATGenerator layers, which are defined in deepchem.models.torch_models.layers.py
Currently, MATModel is intended to be a regression model for the freesolv dataset.
The wrapper class for the Molecular Attention Transformer.
Since we are using a custom data class as input (MATEncoding), we have overriden the default_generator function from DiskDataset and customized it to work with a batch of MATEncoding classes.
Parameters:
dist_kernel (str) – Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential, for the self-attention layer.
n_encoders (int) – Number of encoder layers in the encoder block.
lambda_attention (float) – Constant to be multiplied with the attention matrix in the self-attention layer.
lambda_distance (float) – Constant to be multiplied with the distance matrix in the self-attention layer.
h (int) – Number of attention heads for the self-attention layer.
sa_hsize (int) – Size of dense layer in the self-attention layer.
sa_dropout_p (float) – Dropout probability for the self-attention layer.
output_bias (bool) – If True, dense layers will use bias vectors in the self-attention layer.
d_input (int) – Size of input layer in the feed-forward layer.
d_hidden (int) – Size of hidden layer in the feed-forward layer. Will also be used as d_output for the MATEmbedding layer.
d_output (int) – Size of output layer in the feed-forward layer.
activation (str) – Activation function to be used in the feed-forward layer.
Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.
n_layers (int) – Number of layers in the feed-forward layer.
ff_dropout_p (float) – Dropout probability in the feeed-forward layer.
encoder_hsize (int) – Size of Dense layer for the encoder itself.
encoder_dropout_p (float) – Dropout probability for connections in the encoder layer.
embed_input_hsize (int) – Size of input layer for the MATEmbedding layer.
embed_dropout_p (float) – Dropout probability for the MATEmbedding layer.
gen_aggregation_type (str) – Type of aggregation to be used. Can be ‘grover’, ‘mean’ or ‘contextual’.
gen_dropout_p (float) – Dropout probability for the MATGenerator layer.
gen_n_layers (int) – Number of layers in MATGenerator.
gen_attn_hidden (int) – Size of hidden attention layer in the MATGenerator layer.
gen_attn_out (int) – Size of output attention layer in the MATGenerator layer.
gen_d_output (int) – Size of output layer in the MATGenerator layer.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Normalizing flows are widley used to perform generative models.
This algorithm gives advantages over variational autoencoders (VAE) because
of ease in sampling by applying invertible transformations
(Frey, Gadepally, & Ramsundar, 2022).
Example
>>> importdeepchemasdc>>> fromdeepchem.models.torch_models.layersimportAffine>>> fromdeepchem.models.torch_models.normalizing_flows_pytorchimportNormalizingFlow>>> importtorch>>> fromtorch.distributionsimportMultivariateNormal>>> # initialize the transformation layer's parameters>>> dim=2>>> samples=96>>> transforms=[Affine(dim)]>>> distribution=MultivariateNormal(torch.zeros(dim),torch.eye(dim))>>> # initialize normalizing flow model>>> model=NormalizingFlow(transforms,distribution,dim)>>> # evaluate the log_prob when applying the transformation layers>>> input=distribution.sample(torch.Size((samples,dim)))>>> len(model.log_prob(input))96>>> # evaluates the the sampling method and its log_prob>>> len(model.sample(samples))2
mode (str, default 'regression') – The model type - classification or regression.
n_classes (int, default 3) – The number of classes to predict (used only in classification mode).
n_tasks (int, default 1) – The number of tasks.
batch_size (int, default 1) – The number of datapoints in a batch.
global_features_size (int, default 0) – Size of the global features vector, based on the global featurizers used during featurization.
use_default_fdim (bool) – If True, self.atom_fdim and self.bond_fdim are initialized using values from the GraphConvConstants class.
If False, self.atom_fdim and self.bond_fdim are initialized from the values provided.
atom_fdim (int) – Dimension of atom feature vector.
bond_fdim (int) – Dimension of bond feature vector.
enc_hidden (int) – Size of hidden layer in the encoder layer.
depth (int) – No of message passing steps.
bias (bool) – If True, dense layers will use bias vectors.
enc_activation (str) – Activation function to be used in the encoder layer.
Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.
enc_dropout_p (float) – Dropout probability for the encoder layer.
aggregation (str) – Aggregation type to be used in the encoder layer.
Can choose between ‘mean’, ‘sum’, and ‘norm’.
aggregation_norm (Union[int, float]) – Value required if aggregation type is ‘norm’.
ffn_hidden (int) – Size of hidden layer in the feed-forward network layer.
ffn_activation (str) – Activation function to be used in feed-forward network layer.
Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.
ffn_layers (int) – Number of layers in the feed-forward network layer.
ffn_dropout_p (float) – Dropout probability for the feed-forward network layer.
ffn_dropout_at_input_no_act (bool) – If true, dropout is applied on the input tensor. For single layer, it is not passed to an activation function.
Create a generator that iterates batches for a dataset.
Overrides the existing default_generator method to customize how model inputs are
generated from the data.
Here, the _MapperDMPNN helper class is used, for each molecule in a batch, to get required input parameters:
atom_features
f_ini_atoms_bonds
atom_to_incoming_bonds
mapping
global_features
Then data from each molecule is converted to a _ModData object and stored as list of graphs.
The graphs are modified such that all tensors have same size in 0th dimension. (important requirement for batching)
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
The GROVER model employs a self-supervised message passing transformer architecutre
for learning molecular representation. The pretraining task can learn rich structural
and semantic information of molecules from unlabelled molecular data, which can
be leveraged by finetuning for downstream applications. To this end, GROVER integrates message
passing networks into a transformer style architecture.
Parameters:
node_fdim (int) – the dimension of additional feature for node/atom.
edge_fdim (int) – the dimension of additional feature for edge/bond.
components (dict) – A dictionary of the components of the model. The keys are the names of the
components and the values are the components themselves.
For every atom in the list of SMILES string, the algorithm fetches the atoms
context (vocab label) from the vocabulary provided and returns the vocabulary
labels with a random masking (probability of masking = 0.15).
Random masking of bond labels from bond vocabulary
For every bond in the list of SMILES string, the algorithm fetches the bond
context (vocab label) from the vocabulary provided and returns the vocabulary
labels with a random masking (probability of masking = 0.15).
Parameters:
bond_vocab (GroverBondVocabularyBuilder) – bond vocabulary
smiles (List[str]) – a list of smiles string
Returns:
vocab_label – bond vocab label with random masking
Reload the values of all variables from a checkpoint file.
Parameters:
checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent
checkpoint will be chosen automatically. Call get_checkpoints() to get a
list of all available checkpoints.
model_dir (str, default None) – Directory to restore checkpoint from. If None, use self.model_dir. If
checkpoint is not None, this is ignored.
DTNN is based on the many-body Hamiltonian concept, which is a fundamental principle in quantum mechanics.
DTNN recieves a molecule’s distance matrix and membership of its atom from its Coulomb Matrix representation.
Then, it iteratively refines the representation of each atom by considering its interactions with neighboring atoms.
Finally, it predicts the energy of the molecule by summing up the energies of the individual atoms.
This class implements the Deep Tensor Neural Network (DTNN) [1]_.
Create a generator that iterates batches for a dataset.
It processes inputs through the _compute_features_on_batch function to calculate required features of input.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Implements sequence to sequence translation models.
The model is based on the description in Sutskever et al., “Sequence to
Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215),
although this implementation uses GRUs instead of LSTMs. The goal is to
take sequences of tokens as input, and translate each one into a different
output sequence. The input and output sequences can both be of variable
length, and an output sequence need not have the same length as the input
sequence it was generated from. For example, these models were originally
developed for use in natural language processing. In that context, the
input might be a sequence of English words, and the output might be a
sequence of French words. The goal would be to train the model to translate
sentences from English to French.
The model consists of two parts called the “encoder” and “decoder”. Each one
consists of a stack of recurrent layers. The job of the encoder is to
transform the input sequence into a single, fixed length vector called the
“embedding”. That vector contains all relevant information from the input
sequence. The decoder then transforms the embedding vector into the output
sequence.
These models can be used for various purposes. First and most obviously,
they can be used for sequence to sequence translation. In any case where you
have sequences of tokens, and you want to translate each one into a different
sequence, a SeqToSeq model can be trained to perform the translation.
Another possible use case is transforming variable length sequences into
fixed length vectors. Many types of models require their inputs to have a
fixed shape, which makes it difficult to use them with variable sized inputs
(for example, when the input is a molecule, and different molecules have
different numbers of atoms). In that case, you can train a SeqToSeq model as
an autoencoder, so that it tries to make the output sequence identical to the
input one. That forces the embedding vector to contain all information from
the original sequence. You can then use the encoder for transforming
sequences into fixed length embedding vectors, suitable to use as inputs to
other types of models.
Another use case is to train the decoder for use as a generative model. Here
again you begin by training the SeqToSeq model as an autoencoder. Once
training is complete, you can supply arbitrary embedding vectors, and
transform each one into an output sequence. When used in this way, you
typically train it as a variational autoencoder. This adds random noise to
the encoder, and also adds a constraint term to the loss that forces the
embedding vector to have a unit Gaussian distribution. You can then pick
random vectors from a Gaussian distribution, and the output sequences should
follow the same distribution as the training data.
When training as a variational autoencoder, it is best to use KL cost
annealing, as described in https://arxiv.org/abs/1511.06349. The constraint
term in the loss is initially set to 0, so the optimizer just tries to
minimize the reconstruction loss. Once it has made reasonable progress
toward that, the constraint term can be gradually turned back on. The range
of steps over which this happens is configurable.
In this class, we establish a sequential model for the Sequence to Sequence (DTNN) [1]_.
input_tokens (list) – List of all tokens that may appear in input sequences.
output_tokens (list) – List of all tokens that may appear in output sequences
max_output_length (int) – Maximum length of output sequence that may be generated
encoder_layers (int (default 4)) – Number of recurrent layers in the encoder
decoder_layers (int (default 4)) – Number of recurrent layers in the decoder
embedding_dimension (int (default 512)) – Width of the embedding vector. This also is the width of all
recurrent layers.
dropout (float (default 0.0)) – Dropout probability to use during training.
reverse_input (bool (default True)) – If True, reverse the order of input sequences before sending
them into the encoder. This can improve performance when
working with long sequences.
variational (bool (default False)) – If True, train the model as a variational autoencoder. This
adds random noise to the encoder, and also constrains the
embedding to follow a unit Gaussian distribution.
annealing_start_step (int (default 5000)) – Step (that is, batch) at which to begin turning on the constraint
term for KL cost annealing.
annealing_final_step (int (default 10000)) – Step (that is, batch) at which to finish turning on the constraint
term for KL cost annealing.
sequences (List[str]) – Training samples to fit to. Each sample should be represented
as a tuple of the form (input_sequence, output_sequence).
max_checkpoints_to_keep (int) – Maximum number of checkpoints to keep. Older checkpoints are
discarded.
checkpoint_interval (int) – Frequency at which to write checkpoints, measured in training steps.
restore (bool) – if True, restore the model from the most recent checkpoint and
continue training from there. If False, retrain the model from
scratch.
Builder class for Generative Adversarial Networks.
A Generative Adversarial Network (GAN) [gan1] is a type of generative model. It
consists of two parts called the “generator” and the “discriminator”. The
generator takes random noise as input and transforms it into an output that
(hopefully) resembles the training data. The discriminator takes a set of
samples as input and tries to distinguish the real training samples from the
ones created by the generator. Both of them are trained together. The
discriminator tries to get better and better at telling real from false data,
while the generator tries to get better and better at fooling the discriminator.
>>> classDiscriminator(nn.Module):... def__init__(self,data_input_shape,conditional_input_shape):... super(Discriminator,self).__init__()... self.data_input_shape=data_input_shape... self.conditional_input_shape=conditional_input_shape... # Extracting the actual data dimension... data_dim=data_input_shape[1:]... # Extracting the actual conditional dimension... conditional_dim=conditional_input_shape[1:]... input_dim=sum(data_dim)+sum(conditional_dim)... # Define the dense layers... self.dense1=nn.Linear(input_dim,10)... self.dense2=nn.Linear(10,1)... defforward(self,input):... data_input,conditional_input=input... # Concatenate data_input and conditional_input along the second dimension... discrim_in=torch.cat((data_input,conditional_input),dim=1)... # Pass the concatenated input through the dense layers... x=F.relu(self.dense1(discrim_in))... output=torch.sigmoid(self.dense2(x))... returnoutput
In addition to the parameters listed below, this class accepts all the
keyword arguments from KerasModel.
Parameters:
noise_input_shape (tuple) – the shape of the noise input to the generator. The first dimension
(corresponding to the batch size) should be omitted.
data_input_shape (list of tuple) – the shapes of the inputs to the discriminator. The first dimension
(corresponding to the batch size) should be omitted.
conditional_input_shape (list of tuple) – the shapes of the conditional inputs to the generator and discriminator.
The first dimension (corresponding to the batch size) should be omitted.
If there are no conditional inputs, this should be an empty list.
generator_fn (Callable) – a function that returns a generator. It will be called with no arguments.
The returned value should be a nn.Module whose input is a list
containing a batch of noise, followed by any conditional inputs. The
number and shapes of its outputs must match the return value from
get_data_input_shapes(), since generated data must have the same form as
training data.
discriminator_fn (Callable) – a function that returns a discriminator. It will be called with no
arguments. The returned value should be a nn.Module whose input is a
list containing a batch of data, followed by any conditional inputs. Its
output should be a one dimensional tensor containing the probability of
each sample being a training sample.
device (torch.device) – the device to use for training
n_generators (int) – the number of generators to include
n_discriminators (int) – the number of discriminators to include
create_discriminator_loss (Callable) – a function that returns the loss function for the discriminator. It will
be called with two arguments: the output from the discriminator on a
batch of training data, and the output from the discriminator on a batch
of generated data. The default implementation is appropriate for most
cases. Subclasses can override this if the need to customize it.
create_generator_loss (Callable) – a function that returns the loss function for the generator. It will be
called with one argument: the output from the discriminator on a batch of
generated data. The default implementation is appropriate for most
cases. Subclasses can override this if the need to customize it.
_call_discriminator (Callable) – a function that invokes the discriminator on a set of inputs. It will be
called with three arguments: the discriminator to invoke, the list of
data inputs, and the list of conditional inputs. The default
implementation is appropriate for most cases. Subclasses can override
this if the need to customize it.
Get a batch of random noise to pass to the generator.
This should return a NumPy array whose shape matches the one returned by
get_noise_input_shape(). The default implementation returns normally
distributed values. Subclasses can override this to implement a different
distribution.
Parameters:
batch_size (int) – the number of samples to generate
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Returns:
output – A Tensor equal to the loss function to use for optimizing the generator.
The default implementation is appropriate for most cases. Subclasses can
override this if the need to customize it.
Parameters:
discrim_output_train (Tensor) – the output from the discriminator on a batch of training data. This is
its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
Returns:
output – A Tensor equal to the loss function to use for optimizing the discriminator.
Function to get the discriminator loss from the fit_generator output
Parameters:
outputs (list of Tensor) – the output from the discriminator on a batch of training data. This is
its estimate of the probability that each sample is training data.
labels (Tensor) – the labels for the batch. These are ignored.
weights (Tensor) – the weights for the batch. These are ignored.
Return type:
the value of the discriminator loss from the fit_generator output.
Function to get the Generator loss from the fit_generator output
Parameters:
outputs (Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
labels (Tensor) – the labels for the batch. These are ignored.
weights (Tensor) – the weights for the batch. These are ignored.
Return type:
the value of the generator loss function for this input.
A Generative Adversarial Network (GAN) is a type of generative model. It
consists of two parts called the “generator” and the “discriminator”. The
generator takes random noise as input and transforms it into an output that
(hopefully) resembles the training data. The discriminator takes a set of
samples as input and tries to distinguish the real training samples from the
ones created by the generator. Both of them are trained together. The
discriminator tries to get better and better at telling real from false data,
while the generator tries to get better and better at fooling the discriminator.
In many cases there also are additional inputs to the generator and
discriminator. In that case it is known as a Conditional GAN (CGAN), since it
learns a distribution that is conditional on the values of those inputs. They
are referred to as “conditional inputs”.
Many variations on this idea have been proposed, and new varieties of GANs are
constantly being proposed. This class tries to make it very easy to implement
straightforward GANs of the most conventional types. At the same time, it
tries to be flexible enough that it can be used to implement many (but
certainly not all) variations on the concept.
To define a GAN, you must create a subclass that provides implementations of
the following methods:
If you want your GAN to have any conditional inputs you must also implement:
get_conditional_input_shapes()
The following methods have default implementations that are suitable for most
conventional GANs. You can override them if you want to customize their
behavior:
This class allows a GAN to have multiple generators and discriminators, a model
known as MIX+GAN. It is described in [2]
This can lead to better models, and is especially useful for reducing mode
collapse, since different generators can learn different parts of the
distribution. To use this technique, simply specify the number of generators
and discriminators when calling the constructor. You can then tell
predict_gan_generator() which generator to use for predicting samples.
>>> classDiscriminator(nn.Module):... def__init__(self,data_input_shape,conditional_input_shape):... super(Discriminator,self).__init__()... self.data_input_shape=data_input_shape... self.conditional_input_shape=conditional_input_shape... # Extracting the actual data dimension... data_dim=data_input_shape[1:]... # Extracting the actual conditional dimension... conditional_dim=conditional_input_shape[1:]... input_dim=sum(data_dim)+sum(conditional_dim)... # Define the dense layers... self.dense1=nn.Linear(input_dim,10)... self.dense2=nn.Linear(10,1)... defforward(self,input):... data_input,conditional_input=input... # Concatenate data_input and conditional_input along the second dimension... discrim_in=torch.cat((data_input,conditional_input),dim=1)... # Pass the concatenated input through the dense layers... x=F.relu(self.dense1(discrim_in))... output=torch.sigmoid(self.dense2(x))... returnoutput
n_generators (int) – the number of generators to include
n_discriminators (int) – the number of discriminators to include
create_discriminator_loss (Callable) – a function that returns the loss function for the discriminator. It will
be called with two arguments: the output from the discriminator on a
batch of training data, and the output from the discriminator on a batch
of generated data. The default implementation is appropriate for most
cases. Subclasses can override this if the need to customize it.
create_generator_loss (Callable) – a function that returns the loss function for the generator. It will be
called with one argument: the output from the discriminator on a batch of
generated data. The default implementation is appropriate for most
cases. Subclasses can override this if the need to customize it.
_call_discriminator (Callable) – a function that invokes the discriminator on a set of inputs. It will be
called with three arguments: the discriminator to invoke, the list of
data inputs, and the list of conditional inputs. The default
implementation is appropriate for most cases. Subclasses can override
this if the need to customize it.
Get the shape of the generator’s noise input layer.
Subclasses must override this to return a tuple giving the shape of the
noise input. The actual Input layer will be created automatically. The
dimension corresponding to the batch size should be omitted.
Subclasses must override this to return a list of tuples, each giving the
shape of one of the inputs. The actual Input layers will be created
automatically. This list of shapes must also match the shapes of the
generator’s outputs. The dimension corresponding to the batch size should
be omitted.
Subclasses may override this to return a list of tuples, each giving the
shape of one of the conditional inputs. The actual Input layers will be
created automatically. The dimension corresponding to the batch size should
be omitted.
The default implementation returns an empty list, meaning there are no
conditional inputs.
Subclasses must override this to construct the generator. The returned
value should be a tf.keras.Model whose inputs are a batch of noise, followed
by any conditional inputs. The number and shapes of its outputs must match
the return value from get_data_input_shapes(), since generated data must
have the same form as training data.
Subclasses must override this to construct the discriminator. The returned
value should be a tf.keras.Model whose inputs are all data inputs, followed
by any conditional inputs. Its output should be a one dimensional tensor
containing the probability of each sample being a training sample.
batches (iterable) – batches of data to train the discriminator on, each represented as a dict
that maps Inputs to values. It should specify values for all members of
data_inputs and conditional_inputs.
generator_steps (float) – the number of training steps to perform for the generator for each batch.
This can be used to adjust the ratio of training steps for the generator
and discriminator. For example, 2.0 will perform two training steps for
every batch, while 0.5 will only perform one training step for every two
batches.
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set
this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint before training
it.
batch_size (int) – the number of samples to generate. If either noise_input or
conditional_inputs is specified, this argument is ignored since the batch
size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default),
get_noise_batch() is called to generate a random input, so each call will
produce a new set of samples.
conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if
the GAN has any conditional inputs.
generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for
generating the samples.
Returns:
An array (if the generator has only one output) or list of arrays (if it has
multiple outputs) containing the generated samples.
This class implements Wasserstein Generative Adversarial Networks (WGANs) as
described in Arjovsky et al., “Wasserstein GAN” [wgan1].
A WGAN is conceptually rather different from a conventional GAN, but in
practical terms very similar. It reinterprets the discriminator (often called
the “critic” in this context) as learning an approximation to the Earth Mover
distance between the training and generated distributions. The generator is
then trained to minimize that distance. In practice, this just means using
slightly different loss functions for training the generator and discriminator.
WGANs have theoretical advantages over conventional GANs, and they often work
better in practice. In addition, the discriminator’s loss function can be
directly interpreted as a measure of the quality of the model. That is an
advantage over conventional GANs, where the loss does not directly convey
information about the quality of the model.
The theory WGANs are based on requires the discriminator’s gradient to be
bounded. The original paper achieved this by clipping its weights. This
class instead does it by adding a penalty term to the discriminator’s loss, as
described in [wgan2]. This is sometimes found to produce better results.
There are a few other practical differences between GANs and WGANs. In a
conventional GAN, the discriminator’s output must be between 0 and 1 so it can
be interpreted as a probability. In a WGAN, it should produce an unbounded
output that can be interpreted as a distance.
When training a WGAN, you also should usually use a smaller value for
generator_steps. Conventional GANs rely on keeping the generator and
discriminator “in balance” with each other. If the discriminator ever gets
too good, it becomes impossible for the generator to fool it and training
stalls. WGANs do not have this problem, and in fact the better the
discriminator is, the easier it is for the generator to improve. It therefore
usually works best to perform several training steps on the discriminator for
each training step on the generator.
>>> classDiscriminator(nn.Module):... def__init__(self,data_input_shape,conditional_input_shape):... super(Discriminator,self).__init__()... self.data_input_shape=data_input_shape... self.conditional_input_shape=conditional_input_shape... # Extracting the actual data dimension... data_dim=data_input_shape[1:]... # Extracting the actual conditional dimension... conditional_dim=conditional_input_shape[1:]... input_dim=sum(data_dim)+sum(conditional_dim)... # Define the dense layers... self.dense1=nn.Linear(input_dim,10)... self.dense2=nn.Linear(10,1)... defforward(self,input):... data_input,conditional_input=input... # Concatenate data_input and conditional_input along the second dimension... discrim_in=torch.cat((data_input,conditional_input),dim=1)... # Pass the concatenated input through the dense layers... x=F.relu(self.dense1(discrim_in))... output=self.dense2(x)... returnoutput
Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.”
Advances in neural information processing systems 30 (2017).
(https://arxiv.org/abs/1704.00028)
discrim_output (torch.Tensor) – the output from the discriminator on a batch of generated data. This is
its estimate of the probability that each sample is training data.
discrim_output_train (List[Tensor]) – the output from the discriminator on a batch of training data. This is
its estimate of the probability that each sample is training data.
discrim_output_gen (Tensor) – the output from the discriminator on a batch of generated data.
Returns:
A Tensor equal to the loss function to use for optimizing the discriminator.
Model for de-novo generation of small molecules based on work of Nicola De Cao et al. [molgan1].
It uses a GAN directly on graph data and a reinforcement learning objective to induce the network to generate molecules with certain chemical properties.
Utilizes WGAN infrastructure; uses adjacency matrix and node features as inputs.
Inputs need to be one-hot representation.
You can change the above parameters to get better results. The above example is just a simple example to show how to use the model.
You can try iterbatches(1000) for better results.
Now, let’s generate some molecules using the trained model
We will generate 10 molecules and then convert them to RDKit molecules.
You can increase the number of generated molecules by changing the parameter in predict_gan_generator function.
Generated molecules are in the form of GraphMatrix. You can convert them to RDKit molecules using defeaturize function of MolGanFeaturizer.
Now, let’s remove invalid molecules from the generated molecules.
We can see that currently training is unstable and 0 is a common outcome. You can try training the model with different parameters to get better results.
Create generator model.
Take noise data as an input and processes it through number of
dense and dropout layers. Then data is converted into two forms
one used for training and other for generation of compounds.
The model has two outputs:
edges
nodes
The format differs depending on intended use (training or sample generation).
For sample generation use flag, sample_generation=True while calling generator
i.e. gan.generators[0](noise_input, training=False, sample_generation=True).
For training the model, set sample_generation=False
batch_size (int) – the number of samples to generate. If either noise_input or
conditional_inputs is specified, this argument is ignored since the batch
size is then determined by the size of that argument.
noise_input (array) – the value to use for the generator’s noise input. If None (the default),
get_noise_batch() is called to generate a random input, so each call will
produce a new set of samples.
conditional_inputs (list of arrays) – NOT USED.
the values to use for all conditional inputs. This must be specified if
the GAN has any conditional inputs.
generator_index (int) – NOT USED.
the index of the generator (between 0 and n_generators-1) to use for
generating the samples.
Returns:
Returns a list of GraphMatrix object that can be converted into
RDKit molecules using MolGANFeaturizer defeaturize function.
n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78
if chirality is used by WeaveFeaturizer.
n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.
n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)
n_weave (int, optional (default 2)) – The number of weave layers in this model.
fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of
this list determines the number of layers.
conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight
initialization of each convolutional layer. The length of this lisst
should equal n_weave. Alternatively, this may be a single value instead
of a list, in which case the same value is used for each layer.
weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight
initialization of each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The
length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in
which case the same value is used for every layer.
dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
final_conv_activation_fn (Optional[ActivationFn] (default F.tanh)) – The activation funcntion to apply to the final
convolution at the end of the weave convolutions. If None, then no
activate is applied (hence linear).
activation_fns (str (default relu)) – The activation function to apply to each fully connected layer. The length
of this list should equal len(layer_sizes). Alternatively this may be a
single value instead of a list, in which case the same value is used for
every layer.
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying
activation functions on convolutional and fully connected layers.
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian
histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the
original dimensions of the input.
mode (str (default "classification")) – Either “classification” or “regression” for type of model.
n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)
batch_size (int (default 100)) – Batch size used by this model for training.
This model implements the Weave style graph convolutions
from [1]_.
The biggest difference between WeaveModel style convolutions
and GraphConvModel style convolutions is that Weave
convolutions model bond features explicitly. This has the
side effect that it needs to construct a NxN matrix
explicitly to model bond interactions. This may cause
scaling issues, but may possibly allow for better modeling
of subtle bond effects.
Note that [1]_ introduces a whole variety of different architectures for
Weave models. The default settings in this class correspond to the W2N2
variant from [1]_ which is the most commonly used variant..
Examples
Here’s an example of how to fit a WeaveModel on a tiny sample dataset.
n_atom_feat (int, optional (default 75)) – Number of features per atom. Note this is 75 by default and should be 78
if chirality is used by WeaveFeaturizer.
n_pair_feat (int, optional (default 14)) – Number of features per pair of atoms.
n_hidden (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_graph_feat (int, optional (default 128)) – Number of output features for each molecule(graph)
n_weave (int, optional (default 2)) – The number of weave layers in this model.
fully_connected_layer_sizes (list (default [2000, 100])) – The size of each dense layer in the network. The length of
this list determines the number of layers.
conv_weight_init_stddevs (list or float (default 0.03)) – The standard deviation of the distribution to use for weight
initialization of each convolutional layer. The length of this lisst
should equal n_weave. Alternatively, this may be a single value instead
of a list, in which case the same value is used for each layer.
weight_init_stddevs (list or float (default 0.01)) – The standard deviation of the distribution to use for weight
initialization of each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float (default 0.0)) – The value to initialize the biases in each fully connected layer. The
length of this list should equal len(layer_sizes).
Alternatively this may be a single value instead of a list, in
which case the same value is used for every layer.
weight_decay_penalty (float (default 0.0)) – The magnitude of the weight decay penalty to use
weight_decay_penalty_type (str (default "l2")) – The type of penalty to use for weight decay, either ‘l1’ or ‘l2’
dropouts (list or float (default 0.25)) – The dropout probablity to use for each fully connected layer. The length of this list
should equal len(layer_sizes). Alternatively this may be a single value
instead of a list, in which case the same value is used for every layer.
final_conv_activation_fn (Optional[ActivationFn] (default F.tanh)) – The activation funcntion to apply to the final
convolution at the end of the weave convolutions. If None, then no
activate is applied (hence linear).
activation_fns (str (default relu)) – The activation function to apply to each fully connected layer. The length
of this list should equal len(layer_sizes). Alternatively this may be a
single value instead of a list, in which case the same value is used for
every layer.
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying
activation functions on convolutional and fully connected layers.
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian
histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the
original dimensions of the input.
mode (str (default "classification")) – Either “classification” or “regression” for type of model.
n_classes (int (default 2)) – Number of classes to predict (only used in classification mode)
batch_size (int (default 100)) – Batch size used by this model for training.
Compute tensors that will be input into the model from featurized representation.
The featurized input to WeaveModel is instances of WeaveMol created by
WeaveFeaturizer. This method converts input WeaveMol objects into
tensors used by the Keras implementation to compute WeaveModel outputs.
Parameters:
X_b (np.ndarray) – A numpy array with dtype=object where elements are WeaveMol objects.
Returns:
atom_feat (np.ndarray) – Of shape (N_atoms, N_atom_feat).
pair_feat (np.ndarray) – Of shape (N_pairs, N_pair_feat). Note that N_pairs will depend on
the number of pairs being considered. If max_pair_distance is
None, then this will be N_atoms**2. Else it will be the number
of pairs within the specifed graph distance.
pair_split (np.ndarray) – Of shape (N_pairs,). The i-th entry in this array will tell you the
originating atom for this pair (the “source”). Note that pairs are
symmetric so for a pair (a, b), both a and b will separately be
sources at different points in this array.
atom_split (np.ndarray) – Of shape (N_atoms,). The i-th entry in this array will be the molecule
with the i-th atom belongs to.
atom_to_pair (np.ndarray) – Of shape (N_pairs, 2). The i-th row in this array will be the array
[a, b] if (a, b) is a pair to be considered. (Note by symmetry, this
implies some other row will contain [b, a].
Implements a progressive multitask neural network in PyTorch.
Progressive networks allow for multitask learning where each task
gets a new column of weights and lateral connections to previous tasks
are added to the network. As a result, there is no exponential
forgetting where previous tasks are ignored.
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
nb_epoch (int) – the number of epochs to train for
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
A 1D convolutional neural network to work on smiles strings for both
classification and regression tasks.
Reimplementation of the discriminator module in ORGAN [1] .
Originated from [2].
The model converts the input smile strings to an embedding vector, the vector
is convolved and pooled through a series of convolutional filters which are concatnated
and later passed through a simple dense layer. The resulting vector goes through a Highway
layer [3] which finally as per the nature of the task is passed through a dense layer.
UNet is a convolutional neural network architecture for fast and precise segmentation of images
based on the works of Ronneberger et al. [1]. The architecture consists of an encoder, a bottleneck,
and a decoder. The encoder downsamples the input image to capture the context of the image. The
bottleneck captures the most important features of the image. The decoder upsamples the image to
generate the segmentation mask. The encoder and decoder are connected by skip connections to preserve
spatial information.
We will create a UNet model with 3 input channels and 1 output channel. We will then fit the model on the dataset for 5 epochs and predict the output images.
1. This implementation of the UNet model makes some changes to the padding of the inputs to the convolutional layers.
The padding is set to ‘same’ to ensure that the output size of the convolutional layers is the same as the input size.
This is done to preserve the spatial information of the input image and to keep the output size of the encoder and decoder the same.
The input image size must be divisible by 2^4 = 16 to ensure that the output size of the encoder and decoder is the same.
Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one.
But in the case of GANs or similar you might have multiple. Optimization with multiple optimizers only works in
the manual optimization mode.
Returns:
Any of these 6 options.
Single optimizer.
List or Tuple of optimizers.
Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers
(or multiple lr_scheduler_config).
Dictionary, with an "optimizer" key, and (optionally) a "lr_scheduler"
key whose value is a single LR scheduler or lr_scheduler_config.
None - Fit will run without any optimizer.
The lr_scheduler_config is a dictionary which contains the scheduler and its associated configuration.
The default configuration is shown below.
lr_scheduler_config={# REQUIRED: The scheduler instance"scheduler":lr_scheduler,# The unit of the scheduler's step size, could also be 'step'.# 'epoch' updates the scheduler on epoch end whereas 'step'# updates it after a optimizer update."interval":"epoch",# How many epochs/steps should pass between calls to# `scheduler.step()`. 1 corresponds to updating the learning# rate after every epoch/step."frequency":1,# Metric to to monitor for schedulers like `ReduceLROnPlateau`"monitor":"val_loss",# If set to `True`, will enforce that the value specified 'monitor'# is available when the scheduler is updated, thus stopping# training if not found. If set to `False`, it will only produce a warning"strict":True,# If using the `LearningRateMonitor` callback to monitor the# learning rate progress, this keyword can be used to specify# a custom logged name"name":None,}
When there are schedulers in which the .step() method is conditioned on a value, such as the
torch.optim.lr_scheduler.ReduceLROnPlateau scheduler, Lightning requires that the
lr_scheduler_config contains the keyword "monitor" set to the metric name that the scheduler
should be conditioned on.
# The ReduceLROnPlateau scheduler requires a monitordefconfigure_optimizers(self):optimizer=Adam(...)return{"optimizer":optimizer,"lr_scheduler":{"scheduler":ReduceLROnPlateau(optimizer,...),"monitor":"metric_to_track","frequency":"indicates how often the metric is updated",# If "monitor" references validation metrics, then "frequency" should be set to a# multiple of "trainer.check_val_every_n_epoch".},}# In the case of two optimizers, only one using the ReduceLROnPlateau schedulerdefconfigure_optimizers(self):optimizer1=Adam(...)optimizer2=SGD(...)scheduler1=ReduceLROnPlateau(optimizer1,...)scheduler2=LambdaLR(optimizer2,...)return({"optimizer":optimizer1,"lr_scheduler":{"scheduler":scheduler1,"monitor":"metric_to_track",},},{"optimizer":optimizer2,"lr_scheduler":scheduler2},)
Metrics can be made available to monitor by simply logging it using
self.log('metric_to_track',metric_val) in your LightningModule.
Note
Some things to know:
Lightning calls .backward() and .step() automatically in case of automatic optimization.
If a learning rate scheduler is specified in configure_optimizers() with key
"interval" (default “epoch”) in the scheduler configuration, Lightning will call
the scheduler’s .step() method automatically in case of automatic optimization.
If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizer.
If you use torch.optim.LBFGS, Lightning handles the closure function automatically for you.
If you use multiple optimizers, you will have to switch to ‘manual optimization’ mode and step them
yourself.
If you need to control how often the optimizer steps, override the optimizer_step() hook.
This is a DeepChem model implemented by a Jax Model
Here is a simple example of that uses JaxModel to train a
Haiku (JAX Neural Network Library) based model on deepchem
dataset.
model (hk.State or Function) – Any Jax based model that has a apply method for computing the network. Currently
only haiku models are supported.
params (hk.Params) – The parameter of the Jax based networks
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each
batch, as described above
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (optax object) – For the time being, it is optax object
rng (jax.random.PRNGKey, optional (default 1)) – A default global PRNG key to use for drawing random numbers.
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using
logging by default.
Train this model on a dataset.
:param dataset: the Dataset to train on
:type dataset: Dataset
:param nb_epoch: the number of epochs to train for
:type nb_epoch: int
:param deterministic: if True, the samples are processed in order. If False, a different random
order is used for each epoch.
Parameters:
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Returns:
The average loss over the most recent checkpoint interval
Miscellanous Parameters Yet To Add
———————————-
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of hk.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
Work in Progress
—————-
[1] Integerate the optax losses, optimizers, schedulers with Deepchem
[2] Support for saving & loading the model.
[3] Adding support for output types (choosing only self._loss_outputs)
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Generates predictions for input samples, processing samples in a batch.
:param X: the input data, as a Numpy array.
:type X: ndarray
:param transformers: Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Uses self to make predictions on provided Dataset object.
Parameters:
dataset (dc.data.Dataset) – Dataset to make prediction on
transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
output_types (String or list of Strings) – If specified, all outputs of this type will be retrieved
from the model. If output_types is specified, outputs must
be None.
Returns:
a NumPy array of the model produces a single output, or a list of arrays
Evaluate the performance of this model on the data produced by a generator.
:param generator: this should generate batches, each represented as a tuple of the form
transformers (List[dc.trans.Transformers]) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
per_task_metrics (bool) – If True, return per-task scores.
Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are
generated from the data.
:param dataset: the data to iterate
:type dataset: Dataset
:param epochs: the number of times to iterate over the full dataset
:type epochs: int
:param mode: allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
Parameters:
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
This is class is derived from the JaxModel class and methods are also very similar to JaxModel,
but it has the option of passing multiple arguments(Done using *args) suitable for PINNs model.
Ex - Approximating f(x, y, z, t) satisfying a Linear differential equation.
This model is recommended for linear partial differential equations but if you can accurately write
the gradient function in Jax depending on your use case, then it will work as well.
This class requires two functions apart from the usual function definition and weights
[1] grad_fn : Each PINNs have a different strategy for calculating its final losses. This
function tells the PINNModel how to go about computing the derivatives for backpropagation.
It should follow this format:
>>>>> def gradient_fn(forward_fn, loss_outputs, initial_data):>>>> def model_loss(params, target, weights, rng, ...):>>>> # write code using the arguments.>> # ... indicates the variable number of positional arguments.>> return>>>> return model_loss
“…” can be replaced with various arguments like (x, y, z, y) but should match with eval_fn
[2] eval_fn: Function for defining how the model needs to compute during inference.
It should follow this format
>>>>> def create_eval_fn(forward_fn, params):>> def eval_model(..., rng=None):>> # write code here using arguments>>>> return>> return eval_model
“…” can be replaced with various arguments like (x, y, z, y) but should match with grad_fn
[3] boundary_data:
For a detailed example, check out - deepchem/models/jax_models/tests/test_pinn.py where we have
solved f’(x) = -sin(x)
References
Notes
This class requires Jax, Haiku and Optax to be installed.
forward_fn (hk.State or Function) – Any Jax based model that has a apply method for computing the network. Currently
only haiku models are supported.
params (hk.Params) – The parameter of the Jax based networks
initial_data (dict) – This acts as a session variable which will be passed as a dictionary in grad_fn
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (optax object) – For the time being, it is optax object
grad_fn (Callable (default create_default_gradient_fn)) – It defines how the loss function and gradients need to be calculated for the PINNs model
update_fn (Callable (default create_default_update_fn)) – It defines how the weights need to be updated using backpropogation. We have used optax library
for optimisation operations. Its reccomended to leave this default.
eval_fn (Callable (default create_default_eval_fn)) – Function for defining on how the model needs to compute during inference.
rng (jax.random.PRNGKey, optional (default 1)) – A default global PRNG key to use for drawing random numbers.
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using
logging by default.
epochs (int) – the number of times to iterate over the full dataset
mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size
Returns:
a generator that iterates batches, each represented as a tuple of lists
Wrapper class that wraps HuggingFace models as DeepChem models
The class provides a wrapper for wrapping models from HuggingFace
ecosystem in DeepChem and training it via DeepChem’s api. The reason
for this might be that you might want to do an apples-to-apples comparison
between HuggingFace from the transformers library and DeepChem library.
The HuggingFaceModel has a Has-A relationship by wrapping models from
transformers library. Once a model is wrapped, DeepChem’s API are used
for training, prediction, evaluation and other downstream tasks.
A HuggingFaceModel wrapper also has a tokenizer which tokenizes raw
SMILES strings into tokens to be used by downstream models. The SMILES
strings are generally stored in the X attribute of deepchem.data.Dataset object’.
This differs from the DeepChem standard workflow as tokenization is done
on the fly here. The approach allows us to leverage transformers library’s fast
tokenization algorithms and other utilities like data collation, random masking of tokens
for masked language model training etc.
Parameters:
model (transformers.modeling_utils.PreTrainedModel) – The HuggingFace model to wrap.
task (str, (optional, default None)) –
The task defines the type of learning task in the model. The supported tasks are
mlm - masked language modeling commonly used in pretraining
mtr - multitask regression - a task used for both pretraining base models and finetuning
regression - use it for regression tasks, like property prediction
classification - use it for classification tasks
When the task is not specified or None, the wrapper returns raw output of the HuggingFaceModel.
In cases where the HuggingFaceModel is a model without a task specific head, this output will be
the last hidden states.
>>> # finetune a classification model>>> # making dataset suitable for classification>>> importnumpyasnp>>> y=np.random.choice([0,1],size=dataset.y.shape)>>> dataset=dc.data.NumpyDataset(X=dataset.X,y=y,w=dataset.w,ids=dataset.ids)
model (torch.nn.Module) – the PyTorch model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each
batch, as described above
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None,
a temporary directory is created.
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is
ignored.
tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training
wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using
logging by default. If tensorboard is set, data is also
logged to TensorBoard. If wandb is set, data is also logged
to Weights & Biases. Logging happens at global steps. Roughly,
a global step corresponds to one batch of training. If you’d
like a printout every 10 batch steps, you’d set
log_frequency=10 for example.
device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is
chosen automatically.
regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add
to the loss function
wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics
Load HuggingFace model from a pretrained checkpoint.
The utility can be used for loading a model from a checkpoint.
Given model_dir, it checks for existing checkpoint in the directory.
If a checkpoint exists, the models state is loaded from the checkpoint.
If the option from_hf_checkpoint is set as True, then it loads a pretrained
model using HuggingFace models from_pretrained method. This option
interprets model_dir as a model id of a pretrained model hosted inside a model repo
on huggingface.co or path to directory containing model weights saved using save_pretrained
method of a HuggingFace model.
generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
variables (list of torch.nn.Parameter) – the variables to train. If None (the default), all trainable variables in
the model are used.
loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.
all_losses (Optional[List[float]], optional (default None)) – If specified, all logged losses are appended into this list. Note that
you can call fit() repeatedly with the same list and losses will
continue to be appended.
Return type:
The average loss over the most recent checkpoint interval
Note
A HuggingFace model can return embeddings (last hidden state), attentions.
Support must be added to return the embeddings to the user, so that it can
be used for other downstream applications.
Chemberta is a transformer style model for learning on SMILES strings.
The model architecture is based on the RoBERTa architecture. The model
has can be used for both pretraining an embedding and finetuning for
downstream applications.
The model supports two types of pretraining tasks - pretraining via masked language
modeling and pretraining via multi-task regression. To pretrain via masked language
modeling task, use task = mlm and for pretraining via multitask regression task,
use task = mtr. The model supports the regression, classification and multitask
regression finetuning tasks and they can be specified using regression, classification
and mtr as arguments to the task keyword during model initialisation.
The model uses a tokenizer To create input tokens for the models from the SMILES strings.
The default tokenizer model is a byte-pair encoding tokenizer trained on PubChem10M dataset
and loaded from huggingFace model hub (https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_60k).
Parameters:
task (str) –
The task defines the type of learning task in the model. The supported tasks are
mlm - masked language modeling commonly used in pretraining
mtr - multitask regression - a task used for both pretraining base models and finetuning
regression - use it for regression tasks, like property prediction
classification - use it for classification tasks
tokenizer_path (str) – Path containing pretrained tokenizer used to tokenize SMILES string for model inputs. The tokenizer path can either be a huggingFace tokenizer model or a path in the local machine containing the tokenizer.
n_tasks (int, default 1) – Number of prediction targets for a multitask learning model
model (torch.nn.Module) – the PyTorch model implementing the calculation
loss (dc.models.losses.Loss or function) – a Loss or function defining how to compute the training loss for each
batch, as described above
output_types (list of strings, optional (default None)) – the type of each output from the model, as described above
batch_size (int, optional (default 100)) – default batch size for training and evaluating
model_dir (str, optional (default None)) – the directory on disk where the model will be stored. If this is None,
a temporary directory is created.
learning_rate (float or LearningRateSchedule, optional (default 0.001)) – the learning rate to use for fitting. If optimizer is specified, this is
ignored.
optimizer (Optimizer, optional (default None)) – the optimizer to use for fitting. If this is specified, learning_rate is
ignored.
tensorboard (bool, optional (default False)) – whether to log progress to TensorBoard during training
wandb (bool, optional (default False)) – whether to log progress to Weights & Biases during training
log_frequency (int, optional (default 100)) – The frequency at which to log data. Data is logged using
logging by default. If tensorboard is set, data is also
logged to TensorBoard. If wandb is set, data is also logged
to Weights & Biases. Logging happens at global steps. Roughly,
a global step corresponds to one batch of training. If you’d
like a printout every 10 batch steps, you’d set
log_frequency=10 for example.
device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is
chosen automatically.
regularization_loss (Callable, optional) – a function that takes no arguments, and returns an extra contribution to add
to the loss function
wandb_logger (WandbLogger) – the Weights & Biases logger object used to log data and metrics