Implements sequence to sequence translation models.
The model is based on the description in Sutskever et al., “Sequence to
Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215),
although this implementation uses GRUs instead of LSTMs. The goal is to
take sequences of tokens as input, and translate each one into a different
output sequence. The input and output sequences can both be of variable
length, and an output sequence need not have the same length as the input
sequence it was generated from. For example, these models were originally
developed for use in natural language processing. In that context, the
input might be a sequence of English words, and the output might be a
sequence of French words. The goal would be to train the model to translate
sentences from English to French.
The model consists of two parts called the “encoder” and “decoder”. Each one
consists of a stack of recurrent layers. The job of the encoder is to
transform the input sequence into a single, fixed length vector called the
“embedding”. That vector contains all relevant information from the input
sequence. The decoder then transforms the embedding vector into the output
sequence.
These models can be used for various purposes. First and most obviously,
they can be used for sequence to sequence translation. In any case where you
have sequences of tokens, and you want to translate each one into a different
sequence, a SeqToSeq model can be trained to perform the translation.
Another possible use case is transforming variable length sequences into
fixed length vectors. Many types of models require their inputs to have a
fixed shape, which makes it difficult to use them with variable sized inputs
(for example, when the input is a molecule, and different molecules have
different numbers of atoms). In that case, you can train a SeqToSeq model as
an autoencoder, so that it tries to make the output sequence identical to the
input one. That forces the embedding vector to contain all information from
the original sequence. You can then use the encoder for transforming
sequences into fixed length embedding vectors, suitable to use as inputs to
other types of models.
Another use case is to train the decoder for use as a generative model. Here
again you begin by training the SeqToSeq model as an autoencoder. Once
training is complete, you can supply arbitrary embedding vectors, and
transform each one into an output sequence. When used in this way, you
typically train it as a variational autoencoder. This adds random noise to
the encoder, and also adds a constraint term to the loss that forces the
embedding vector to have a unit Gaussian distribution. You can then pick
random vectors from a Gaussian distribution, and the output sequences should
follow the same distribution as the training data.
When training as a variational autoencoder, it is best to use KL cost
annealing, as described in https://arxiv.org/abs/1511.06349. The constraint
term in the loss is initially set to 0, so the optimizer just tries to
minimize the reconstruction loss. Once it has made reasonable progress
toward that, the constraint term can be gradually turned back on. The range
of steps over which this happens is configurable.

compute_saliency
(X)[source]
Compute the saliency map for an input sample.
This computes the Jacobian matrix with the derivative of each output element
with respect to each input element. More precisely,
 If this model has a single output, it returns a matrix of shape
(output_shape, input_shape) with the derivatives.
 If this model has multiple outputs, it returns a list of matrices, one
for each output.
This method cannot be used on models that take multiple inputs.
Parameters:  X (ndarray) – the input data for a single sample 
Returns:  
Return type:  the Jacobian matrix, or a list of matrices 

default_generator
(dataset, epochs=1, mode='fit', deterministic=True, pad_batches=True)[source]
Create a generator that iterates batches for a dataset.
Subclasses may override this method to customize how model inputs are
generated from the data.
Parameters: 
 dataset (Dataset) – the data to iterate
 epochs (int) – the number of times to iterate over the full dataset
 mode (str) – allowed values are ‘fit’ (called during training), ‘predict’ (called
during prediction), and ‘uncertainty’ (called during uncertainty
prediction)
 deterministic (bool) – whether to iterate over the dataset in order, or randomly shuffle the
data for each epoch
 pad_batches (bool) – whether to pad each batch up to this model’s preferred batch size

Returns: 
 a generator that iterates batches, each represented as a tuple of lists
 ([inputs], [outputs], [weights])


evaluate
(dataset, metrics, transformers=[], per_task_metrics=False)[source]
Evaluates the performance of this model on specified dataset.
Parameters: 
 dataset (dc.data.Dataset) – Dataset object.
 metric (deepchem.metrics.Metric) – Evaluation metric
 transformers (list) – List of deepchem.transformers.Transformer
 per_task_metrics (bool) – If True, return pertask scores.

Returns:  Maps tasks to scores under metric.

Return type:  dict


evaluate_generator
(generator, metrics, transformers=[], per_task_metrics=False)[source]
Evaluate the performance of this model on the data produced by a generator.
Parameters: 
 generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
 metric (deepchem.metrics.Metric) – Evaluation metric
 transformers (list of dc.trans.Transformers) – Transformers that the input data has been transformed by. The output
is passed through these transformers to undo the transformations.
 per_task_metrics (bool) – If True, return pertask scores.

Returns:  Maps tasks to scores under metric.

Return type:  dict


fit
(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, variables=None, loss=None, callbacks=[])[source]
Train this model on a dataset.
Parameters: 
 dataset (Dataset) – the Dataset to train on
 nb_epoch (int) – the number of epochs to train for
 max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
 checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
 deterministic (bool) – if True, the samples are processed in order. If False, a different random
order is used for each epoch.
 restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
 variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
 loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
 callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.


fit_generator
(generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, variables=None, loss=None, callbacks=[])[source]
Train this model on data from a generator.
Parameters: 
 generator (generator) – this should generate batches, each represented as a tuple of the form
(inputs, labels, weights).
 max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
 checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
Set this to 0 to disable automatic checkpointing.
 restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.
 variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
 loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
 callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.

Returns: 

Return type:  the average loss over the most recent checkpoint interval


fit_on_batch
(X, y, w, variables=None, loss=None, callbacks=[])[source]
Perform a single step of training.
Parameters: 
 X (ndarray) – the inputs for the batch
 y (ndarray) – the labels for the batch
 w (ndarray) – the weights for the batch
 variables (list of tf.Variable) – the variables to train. If None (the default), all trainable variables in
the model are used.
 loss (function) – a function of the form f(outputs, labels, weights) that computes the loss
for each batch. If None (the default), the model’s standard loss function
is used.
 callbacks (function or list of functions) – one or more functions of the form f(model, step) that will be invoked after
every step. This can be used to perform validation, logging, etc.


fit_sequences
(sequences, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]
Train this model on a set of sequences
Parameters: 
 sequences (iterable) – the training samples to fit to. Each sample should be
represented as a tuple of the form (input_sequence, output_sequence).
 max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
 checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
 restore (bool) – if True, restore the model from the most recent checkpoint and continue training
from there. If False, retrain the model from scratch.


get_checkpoints
(model_dir=None)[source]
Get a list of all available checkpoint files.
Parameters:  model_dir (str, default None) – Directory to get list of checkpoints from. Reverts to self.model_dir if None 

get_global_step
()[source]
Get the number of steps of fitting that have been performed.

static
get_model_filename
(model_dir)[source]
Given model directory, obtain filename for the model itself.

get_num_tasks
()[source]
Get number of tasks.

get_params
(deep=True)[source]
Get parameters for this estimator.
Parameters:  deep (bool, default=True) – If True, will return the parameters for this estimator and
contained subobjects that are estimators. 
Returns:  params – Parameter names mapped to their values. 
Return type:  mapping of string to any 

static
get_params_filename
(model_dir)[source]
Given model directory, obtain filename for the model itself.

get_task_type
()[source]
Currently models can only be classifiers or regressors.

load_from_pretrained
(source_model, assignment_map=None, value_map=None, checkpoint=None, model_dir=None, include_top=True, **kwargs)[source]
Copies variable values from a pretrained model. source_model can either
be a pretrained model or a model with the same architecture. value_map
is a variablevalue dictionary. If no value_map is provided, the variable
values are restored to the source_model from a checkpoint and a default
value_map is created. assignment_map is a dictionary mapping variables
from the source_model to the current model. If no assignment_map is
provided, one is made from scratch and assumes the model is composed of
several different layers, with the final one being a dense layer. include_top
is used to control whether or not the final dense layer is used. The default
assignment map is useful in cases where the type of task is different
(classification vs regression) and/or number of tasks in the setting.
Parameters: 
 source_model (dc.KerasModel, required) – source_model can either be the pretrained model or a dc.KerasModel with
the same architecture as the pretrained model. It is used to restore from
a checkpoint, if value_map is None and to create a default assignment map
if assignment_map is None
 assignment_map (Dict, default None) – Dictionary mapping the source_model variables and current model variables
 value_map (Dict, default None) – Dictionary containing source_model trainable variables mapped to numpy
arrays. If value_map is None, the values are restored and a default
variable map is created using the restored values
 checkpoint (str, default None) – the path to t

